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Many  methods  for  the  statistical  design  and  analysis  of  integrated  circuits  have  been  pro¬ 
posed  over  the  past  years.  However,  these  methods  typically  require  a  large  number  of  com¬ 
putationally  expensive  circuit  simulator  runs,  and  their  applications  are  limited  to  small  cir¬ 
cuits. 

This  research  investigated  new  approaches  for  the  statistical  design  and  analysis  of  MOS 
integrated  circuits.  This  work  has  resulted  in  a  new  and  efficient  circuit  performance  model¬ 
ing  approach  to  statistical  design.  The  proposed  approach  approximates  the  circuit  perfor¬ 
mances,  such  as  gain  and  delay,  by  fitted  models  of  the  inputs  to  the  circuit  simulator.  The 
computationally  inexpensive  fitted  models  are  then  used  as  surrogates  of  the  circuit  simulator 
to  predict  and  optimize  the  parametric  yield  and  to  achieve  off-line  quality  control.  The  use 
of  statistical  design  and  analysis  of  experiments  for  model  construction  have  been  investigated 
theoretically  and  experimentally,  and  different  methods  to  assess  the  adequacy  of  a  fined  per¬ 
formance  model  have  been  studied. 


iii 


ACKNOWLEDGEMENTS 


I  would  like  to  express  my  appreciation  and  gratitude  to  Professor  Sung-Mo  Kang,  my 
dissertation  advisor,  for  his  invaluable  guidance  throughout  the  course  of  this  research.  With 
his  great  personality,  he  provided  support  and  encouragement  in  all  respects  of  my  life 
throughout  my  graduate  study.  I  would  also  like  to  thank  my  co-advisors  Professors  Ibrahim 
N.  Hajj  and  Timothy  N.  Trick  for  their  guidance  and  encouragement. 

I  am  grateful  to  Professors  William  J.  Welch  and  Jerome  Sacks  for  their  advice  and  valu¬ 
able  discussions  on  statisdcal  design  and  analysis  techniques. 

I  also  thank  Dave  Smart,  Su  Shun  Lin,  Seung  Hoon  Lee,  and  other  members  of  the  Cir¬ 
cuits  and  Systems  Group  at  the  Coordinated  Science  Laboratory,  University  of  Illinois,  for 
many  interesting  discussions. 

Finally,  I  would  like  to  thank  my  mother,  Edna,  and  a  very  personal  friend,  Ng  Kam 
Yau,  for  their  love  and  support. 

The  financial  support  of  the  Semiconductor  Research  Corporation,  the  Joint  Services 
Electronics  Program,  and  the  AT&T  Bell  Laboratories  is  acknowledged.  The  use  of  word 
processing  facilities  at  Motorola  Inc.  for  the  final  preparation  of  this  thesis  is  also  acknowl¬ 
edged. 


iv 


TABLE  OF  CONTENTS 


1.  INTRODUCTION  .  1 

2.  STATISTICAL  MODELING  OF  THE  MOS  TRANSISTOR  .  5 

2.1.  Best-  and  Worst-case  Analyses .  6 

2.2.  Critical  MOSFET  Parameters  .  7 

2.3.  Discussion  .  10 

3.  PARAMETRIC  YIELD  PREDICTION  .  11 

3.1.  Previous  Parametric  Yield  Prediction  Methods .  12 

3.1.1.  Monte  Carlo  methods  .  13 

3.1.2.  Statistical  performance  modeling  method  .  15 

3.2.  Uncertainty  Analysis  .  17 

3.3.  Proposed  Parametric  Yield  Prediction  Method  .  19 

3.4.  Yield  Prediction  Examples .  20 

3.5.  Concluding  Remarks  .  35 

4.  PARAMETRIC  YIELD  OPTIMIZATION  .  38 

4.1.  Proposed  Parametric  Yield  Maximization  Method  .  39 

4.2.  Application  to  Yield  Maximization  of  CMOS  Analog  Circuits .  42 

4.3.  Concluding  Remarks  .  59 

5.  PARAMETER  DESIGN  METHOD  FOR  OFF-LINE  QUALITY  CONTROL  .  60 

5.1.  Taguchi’s  Parameter  Design  Method .  61 

5.2.  Proposed  Off-line  Quality  Control  Method .  63 

5.3.  Example:  Minimization  of  Process  Dependent  Clock  Skew  .  64 

5.3.1.  Modeling  the  circuit  performances  .  67 

5.3.2.  Modeling  the  loss  statistics  directly .  72 

5.3.3.  Results  and  comparisons  .  75 

5.4.  Discussion  .  78 

6.  CONCLUSIONS .  80 


V 


APPENDIX  A.  THE  DESIGN  OF  EXPERIMENT  .  84 

A.l.  Introduction  .  84 

A. 2.  Integrated  Mean-squared  Error  Criterion .  85 

A.3.  Example  of  Design  Point  Selection  by  ACED .  89 

A.3.1.  All-variance  design  .  89 

A.3.2.  All-bias  design  .  89 

A.3.3.  Selection  of  a  robust  design .  92 

A.4.  Design  and  Analysis  of  Computer  Experiments  .  93 

A.  5.  Latin  Hypercube  Design .  94 

APPENDIX  B.  MODEL  ASSESSMENT  .  97 

B. l.  A  Statistical  F-test  Procedure  for  Model  Assessment  .  97 

B.2.  Assessing  the  Goodness  of  Fit  .  101 

REFERENCES  .  102 

VITA  .  106 


vl 


CHAPTER  1. 


INTRODUCTION 

During  the  past  decade,  the  feature  sizes  of  the  Metal  Oxide  Semiconductor  (MOS) 
transistors  in  Very  Large  Scale  Integrated  (VLSI)  circuits  have  been  scaled  down  rapidly,  and 
this  trend  is  expected  to  continue  into  the  1990’s.  Despite  the  technological  progress  in  pat¬ 
terning  the  features  of  the  MOS  transistors,  the  statistical  variations  in  the  MOSFET  parame¬ 
ters,  such  as  the  channel  threshold  voltage  and  gate  oxide  thickness,  have  not  been  scaled 
down  in  proportion.  As  a  result,  the  circuit  performances  become  even  more  sensitive  to 
these  uncontrollable  statisrical  variations.  To  ensure  the  manufacturability  of  a  circuit,  statisti¬ 
cal  variabilities  must  be  considered  in  the  design  procedure. 

A  problem  in  the  statistical  design  of  MOS  integrated  circuits  is  the  modeling  of  the 
device  parameter  distribution.  Traditionally,  circuit  simulations  at  the  "best-"  and  "worst- 
case"  process  files  have  been  used  to  evaluate  the  "range"  of  the  circuit  performance 
variations.  Best-  and  worst-case  circuit  simulations  do  not  estimate  the  distributions  of  the 

circuit  performances,  however.  In  Chapter  2  we  review  the  best-  and  worst-case  approach, 

I 

1  and  present  an  improved  model  of  the  MOSFET  parameters  [1].  The  proposed  model 

I  assumes  the  statistical  variations  in  the  circuit  performances  are  mainly  due  to  a  small  subset 

i 

of  critical  parameters.  It  models  the  non-critical  device  parameters  as  functions  of  the  critical 

I 

i  parameters.  This  model  of  the  MOSFET  parameters  [1],  incorporated  into  our  circuit  design 

I  algorithms,  significantly  reduces  the  complexity  of  the  statistical  analysis. 

1 
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The  critical  bottleneck  in  statistical  circuit  design  is  the  high  cost  of  the  circuit  simula¬ 
tions.  Existing  design  methods  typically  require  a  large  number  of  runs.  In  this  thesis  we 
present  a  new  design  approach  which  significantly  reduces  the  simulation  cost.  Our  approach 
assumes  that  the  circuit  performances,  such  as  gain  or  delay,  can  be  approximated  by 
computationally-inexpensive  functions  of  the  inputs  to  the  circuit  simulator.  These  inputs  are 
the  designable  circuit  parameters,  the  operating  conditions,  and  the  distribution  of  the  device 
parameters.  We  fit  the  functions  to  data  collected  from  a  statistically  designed  experiment  in¬ 
curring  relatively  few  runs  of  the  circuit  simulator.  These  fitted  models  act  as  computational¬ 
ly  inexpensive  surrogates  of  the  circuit  simulator  for  performance  prediction.  The  applications 
of  our  design  approach  to  parametric  yield  prediction,  yield  optimization,  and  off-line  quality 
control  are  presented  Chapters  3,  4,  and  5,  respectively. 

Circuit  designers  often  measure  a  circuit’s  quality  by  parametric  yield,  which  is  the 
percentage  of  the  functionally  good  chips  that  satisfy  the  constraints.  Parametric  yield 
prediction  is  important  because  it  assesses  the  profitability  of  a  design,  before  a  large  amount 
of  resources  is  invested  in  the  manufacturing.  In  Chapter  3  we  examine  the  existing  yield 
prediction  methods  and  some  related  research  in  statistics  and  then  present  our  improved  yield 
prediction  algorithm  [2,3].  (Unless  stated  otherwise,  the  term  yield  refers  to  the  parametric 
yield  in  the  rest  of  the  thesis.  Other  causes  of  yield  loss  are  explained  in  [4]).  The  yield 
prediction  methods  in  [1,5]  model  each  performance  by  an  approximating  function  of  the 
uncontrollable  statistical  variations,  and  then  the  Monte  Carlo  method  is  used  with  the  fitted 
models  to  predict  yield.  However,  the  data  are  collected  by  empirical  methods  in  [1,5],  and 
no  attempt  has  been  made  to  assess  the  prediction  capabilities  of  the  fitted  models.  In  the 
proposed  method,  we  fit  the  performance  models  to  data  generated  according  to  a  statistically 
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designed  experiment,  and  introduce  systematic  procedures  to  assess  the  prediction  capabilities 
of  the  fitted  models.  Monte  Carlo  simulation  with  the  fitted  models  leads  to  an  estimate  of 
the  parametric  yield.  We  illustrate  through  circuit  examples  the  experimental  design  and 
model  assessment  procedures. 

Parametric  yield  optimization  is  an  extension  of  yield  prediction.  For  a  known  distribu¬ 
tion  of  the  input  device  parameters,  the  problem  is  to  maximize  the  parametric  yield  with 
respect  to  iiie  designable  circuit  parameters,  such  as  the  MOSFET  channel  widths  and  aspect 
ratios.  Gradient  methods  for  yield  maximization  [6]  compute  yield  gradients  with  respect  to 
the  designable  circuit  parameters,  and  then  use  steepest  ascent  to  optimize  the  parametric 
yield.  Data  from  many  runs  of  the  circuit  simulator  are  needed  to  compute  a  single  yield  gra¬ 
dient,  and  many  gradient  iterations  are  required.  As  a  result,  yield  gradient  methods  typically 
require  a  very  large  number  of  runs,  and  their  applications  are  limited  to  small  circuits. 

In  Chapter  4  we  extend  the  method  in  Chapter  3  to  parametric  yield  optimization  [7,8]. 
We  model  each  circuit  performance  as  a  function  of  all  parameters  of  interest  the  designable 
circuit  parameters,  the  statistical  variations,  and  the  operating  conditions.  Data  generated 
according  to  a  single  experimental  design  for  all  parameters  arc  used  to  identify  and  fit  the 
model.  For  fixed  values  of  the  designable  parameters,  the  Monte  Carlo  yield  is  estimated  with 
the  fitted  models.  The  estimated  yield  is  numerically  optimized.  We  give  circuit  examples 
where  sufficiently  accurate  yield  estimates  and  good  actual  circuit  designs  can  be  achieved 
with  about  100  circuit  simulator  runs. 

Taguchi’s  parameter  design  method  for  off-line  quality  control  [9]  has  generated  great 
interest  in  the  engineering  and  statistics  literature.  Instead  of  maximizing  the  parametric 

yield,  Taguchi  [9]  uses  statistical  design  and  analysis  of  experiments  to  design  products 
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(circuits)  that  :ire  insensitive  to  variations  in  the  manufacturing  process  and/or  the  environ¬ 
mental  conditions.  Like  the  yield  gradient  methods,  Taguchi’s  design  method  typically 
requires  many  (circuit  simulator)  runs.  Data  from  many  runs  are  collapsed  to  estimate  a  sin¬ 
gle  "signal-to-noise  ratios  "  just  as  the  yield  gradient  methods  coiiapse  data  when  calculating 
the  gradients. 

In  Chapter  6  v'e  adapt  our  performance  modeling  method  to  the  off-line  quality  control 
problem.  Again,  we  approximate  the  circuit  performances  by  functions  of  all  the  inputs  to  the 
circuit  simulator,  collect  data  from  a  statistically  des'^ned  experiment,  and  fit  the  performance 
models.  For  a  fixed  set  of  designable  parameters,  the  fitted  models  are  used  to  predict  the 
Taguchi  loss  statistic,  rather  than  parametric  yield.  The  loss  statistic  is  numerically  optimized 
with  respect  to  the  designable  parameters.  We  give  a  circuit  example  in  which  the  Taguchi 
objectives  are  met  with  about  one-third  of  the  runs.  Finally,  in  Chapter  7  we  provide  some 
conclusions  along  with  some  suggestions  for  future  research. 
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CHAPTER  2. 


STATISTICAL  MODELING  OF  THE  MOS  TRANSISTOR 


A  major  goal  of  statistical  circuit  design  is  to  predict,  from  a  known  distribution  of  the 
inputs  to  the  circuit  simulator,  the  distribution  of  the  output  circuit  performances.  In  MOS  in¬ 
tegrated  circuits  these  statistical  inputs  are  the  device  parameters,  such  as  threshold  voltage 
and  oxide  capacitance.  Typically,  empirical  distributions  of  the  device  parameters  are  ob¬ 
tained  by  test  structure  measurements  and  parameter  extraction.  Statistical  circuit  analysis  by 
direct  Monte  Carlo  circuit  simulator  runs  is  computationally  too  expensive,  however.  To 
analyze  a  circuit  with  a  small  number  of  runs,  the  distribution  of  the  device  parameters  should 
be  sampled  effectively. 

The  traditional  best-  and  worst-case  analyses  approach  [10]  to  statistical  circuit  design 
estimates  the  "range"  of  the  performances  by  circuit  simulations  at  the  "best-"  and  "worst- 
case"  files.  Best-  and  worst-case  analyses  do  not  estimate  the  performance  distribution,  how¬ 
ever.  In  this  chapter,  we  present  a  new  statistical  model  to  represent  the  distribution  of  the 
device  parameters  [11].  The  model  in  [11]  assumes  that  inter-die  variations  in  the  circuit 
performances  are  due  to  a  small  subset  of  critical  parameters,  and  models  the  parameter  corre¬ 
lations  by  fitted  quasi-physical  equations.  This  screening  and  modeling  of  MOSFET  parame¬ 
ters  [11]  significantly  reduce  the  problem  dimension.  Section  2.1  outlines  the  best-  and 
worst-case  analysis  method.  The  statistical  model  of  the  device  parameters  is  described  in 
Section  2.2.  A  discussion  in  Section  2.3  concludes  this  chapter. 
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2.1.  Best-  and  Worst-case  Analyses 

The  best-  and  worst-case  analyses  approach  is  the  traditional  method  for  statistical  circuit 
analysis.  It  estimates  the  "range"  of  the  performances  from  a  small  number  of  circuit 
simulations  (usually  three  to  five  runs).  We  denote  the  device  parameters,  such  as  threshold 
voltage  and  oxide  capacitance,  by  P  =  Pg).  The  commonly  used  methods  to  obtain  the 
best-  and  worst-case  files  are 

(a)  One-at-a-time  method:  Each  that  leads  to  a  better  or  worse  performance  is  chosen 
independently.  Typically,  the  chosen  value  is  the  mean  of  p,  ±  2  or  3  standard  deviation. 
This  method  gives  very  conservative  best-  and  worst-case  performances,  because  the 
probability  of  a  MOSFET  falling  outside  the  rectangular  region  bounded  by  the  best-  and 
worst-case  p,  ’s  is  extremely  small. 

(b)  Fast  and  slow  MOSFETs:  Select  "fast"  and  "slow"  transistors  from  the  test  structures, 
and  extract  their  parameters  for  circuit  simulation.  For  digital  circuits,  the  fast  and  slow 
MOSFETs  are  expected  to  give  the  best  and  worst  performances.  This  method  gives  less 
conservative  best-  and  worst-case  estimates  than  the  first  approach  because  it  incorporates 
the  correlations  between  the  p,  ’s.  The  choice  of  the  MOSFET  samples  remains  a 
problem,  however. 

(c)  Principal  component  approach  [12]:  Transform  the  correlated  parameters  pj,...,  Pg  into 
independent  variables  ^i.  .  •  .  ,4^  by  principal  components,  and  then  select  the 

i  =  l,...,q  for  the  circuit  simulation.  The  problem  of  MOSFET  parameter  correlations  is 
eliminated  by  the  transformation. 
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(d)  Independent  process  parameters  [10]:  Select  the  combinations  of  the  manufacturing  pro¬ 
cess  parameters  that  lead  to  the  best-  and  worst-case  performances,  and  generate  the 
MOSFET  parameters  from  a  process/device  simulator,  such  as  FABRICS  [13].  Working 


on  the  independent  process  parameters  avoids  the  analysis  of  the  MOSFET  parameters. 

I 

i  Best-  and  worst-case  circuit  simulations  do  not  provide  the  circuit  performance  distribu¬ 

tions.  Moreover,  there  exist  no  formal  methods  to  validate  the  choice  of  the  best  and  worst- 
case  files.  Clearly,  an  improved  representation  of  the  device  parameter  distribution  is  needed. 


'  2.2.  Critical  MOSFET  Parameters 

I 

I  Yang  and  Chatteijee  [11]  observed  that  a  small  number  of  critical  MOSFET  parameters 

can  account  for  most  of  the  variability  in  the  circuit  behavior.  Four  critical  MOSFET  parame¬ 
ters  were  identified  as  [11] 

! 

•  transistor  channel  width  reduction  {AW), 

•  transistor  channel  length  reduction  (AL ), 


•  gate  oxide  thickness  {t^^),  and 

•  flatband  voltage  (Vy^). 

These  critical  parameters  take  independent  Gaussian  distributions.  It  has  been  shown  [14] 
that  currents  and  capacitances  of  MOS  transistors  are  sensitive  to  the  four  critical  parameters. 
Moreover,  the  sensitivities  to  the  other  device  parameters  are  calculated  to  be  at  least  an  order 
of  magnitude  smaller  [1]. 
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The  non-critical  device  parameters  in  P,  such  as 


•  drain  and  source  resistances, 


•  body  effect  coefficient,  and 

•  channel  length  modulation. 


are  approximated  by  quasi-physical  equations  of  the  critical  parameters.  We  called  the  set  of 
quasi-physical  equations  a  statistical  MOSFET  model.  From  device  physics  and  the  charge 
sharing  concept  [15],  it  is  recognized  that  most  device  parameters  have  an  inverse 
dependency  on  the  transistor  channel  widths  and  lengths.  Thus,  the  model 


Pi 


Pwi  Pu 


(2.1) 


where  W  and  L  are  the  transistor  channel  width  and  length,  is  the  nominal  value  of  Pj , 
and  pwi  and  pu  represent  the  W  and  L  dependencies.  The  exceptions  to  Eq.  (2.1)  are  the 
transistor  gain  [1]: 


Pm  ~  1^  ^ox 


(W  -  AW) 
(L  -  AL)  ’ 


(2.2) 


where  [t  is  the  mobility  and  is  the  gate  oxide  capacitance;  and  the  mobility  degradation 


9,.(W-AW)  eL  ,  9, 

"  (L-AL)  (L-AL)  (W-AW)(L-AL)’ 


(2.3) 


where  0^  is  the  mobility  reduction  due  to  the  vertical  electrical  field  in  the  channel  and  sur¬ 
face  scattering,  and  0,^,  0^,  and  0^  are  the  W  and  L  dependency  coefficients.  Measured 
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data  of  the  device  parameters  are  fitted  to  the  quasi-physical  equations.  The  variability  in  a 
parameter  that  is  not  explained  by  the  model  will  be  ignored  [11]. 

Liu  and  Singhal  [16]  suggested  an  alternative  representation  of  the  MOSFET  parameters. 
They  derived  seven  critical  process  parameters,  Z  =  (z  j,  .  .  .  ,  Z7),  firom  line-monitoring  meas¬ 
urements 


•  offset  in  diffused  line  width  (Dy/ ), 

•  offset  in  poly  width  (D^), 

•  oxide  thickness 

•  flatband  voltage  (Vy^), 

•  substrate  doping  concentration 

•  surface  mobility  (p.),  and 

•  lateral  junction  depth  (Dj). 

The  variation  of  each  Zj  is  taken  as  a  Gaussian  distribution.  The  proposed  statistical  model 
is  [16] 


Pi  ~  f  ■  •  •  '2?)  Sii^EFF^  ^ 

J=l 


GijZj  + 


'tj 


+  e.- 


The  model  consists  of  four  components: 


(2.4) 
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(a)  A  function  fiiz^,  ■  ■  ■  .z?)  that  relates  a  device  parameter  to  its  geometries  and  the 
manufacturing  process. 

(b)  A  nonlinear  function  giiLgpp)  of  the  effective  channel  length  Lppp, 

1  bij 

(c)  An  expression  OijZj  +  ,  with  unknown  constants  aij  and  bij,  from  multi- 

y=i  J 

parameter  linear  regression  [17]. 

(d)  A  random  error  £,-  that  models  the  statistical  variation  of  p,  that  is  not  explained  by  the 
first  three  components  in  Eq.  (2.4).  In  the  implementation,  e,-  is  generated  by  a  Gaussian 
random  number  generator. 

2.3.  Discussion 

The  quasi-physical  statistical  MOSFET  model  in  [1]  is  used  in  our  yield  prediction  and 
optimization  examples.  Little  justification  has  been  introduced  to  support  the  more  complex 
model  in  [16]. 

To  restrict  the  number  of  independent  statistical  variables,  the  parameter  variations  within 
a  die  are  neglected  in  [1]  and  [16].  Intra-die  variations  can  be  important  in  high  performance 
analog  circuits  that  require  closely  matched  devices.  The  statistical  analysis  of  circuits  with 
parameter  mismatches  is  considered  in  [18]. 

More  investigation  is  needed  to  evaluate  the  validity  of  the  statistical  MOSFET  models. 
The  critical  parameters  in  [1, 16]  arc  selected  subjectively,  and  rigorous  methods  have  not 
been  used.  A  survey  of  screening  methods  and  some  examples  are  given  in  [19].  The  step¬ 
wise  regression  and  rank  regression  methods  in  [20, 21]  may  be  useful. 
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CHAPTER  3. 


PARAMETRIC  YIELD  PREDICTION 

Parametric  yield  prediction  has  become  an  increasingly  important  problem  in  the  design 
of  MOS  integrated  circuits.  In  the  last  decade,  VLSI  device  feature  sizes  have  decreased 
dramatically.  However,  the  statistical  variations  of  the  device  parameters  have  not  been 
scaled  down  in  proportion.  As  a  result,  the  circuit  performances  become  even  more  sensitive 
to  statistical  variations,  which  may  lead  to  low  parametric  yield.  To  ensure  acceptable  yield, 
these  statistical  variations  must  be  considered  in  the  circuit  design  procedure. 

A  major  bottleneck  in  parametric  yield  prediction  is  the  high  cost  of  the  circuit  simula¬ 
tion.  The  classical  Monte  Carlo  method  [22]  is  very  expensive  because  it  predicts  the  yield 
from  a  large  number  of  circuit  simulator  runs.  To  reduce  the  simulation  cost,  a  statistical 
modeling  method  is  introduced  in  [1].  This  approach  approximates  the  circuit  performances, 
such  as  gain  and  propagation  delay  time,  by  fined  models  of  four  critical  MOSFET  parame¬ 
ters.  These  fined  models,  estimated  ftom  five  circuit  simulations,  act  as  computationally  inex¬ 
pensive  surrogates  of  the  circuit  simulator  in  the  Monte  Carlo  simulation.  The  study  in  [1] 
found  many  examples  in  which  this  performance  modeling  approach  predicts  yields  fairly  ac¬ 
curately.  However,  many  important  aspects  in  statistical  modeling,  such  as  model  assessment 
and  selection  of  the  inputs  to  run  the  circuit  simulator,  have  not  been  considered  [1]. 

In  this  chapter  we  present  an  improved  circuit  performance  modeling  method  for 
parametric  yield  prediction.  As  in  [1],  our  method  assumes  that  the  circuit  performances  can 
be  approximated  by  fitted  models  of  a  small  subset  of  critical  parameters,  and  perform  Monte 


11 


Carlo  simulation  with  these  fitted  models  to  predict  yield.  Unlike  [1],  the  data  in  our  method 
are  collected  according  to  a  statistically  designed  experiment,  which  ensures  all  the  relevant 
statistical  information  are  gathered  from  a  small  number  of  circuit  simulator  runs.  Funher- 
more,  we  assess  the  prediction  capability  of  each  fitted  model  before  it  is  used  to  predict 
yield.  Section  3. 1  reviews  the  Monte  Carlo  and  the  statistical  performance  modeling  methods. 
Section  3.2  summarizes  the  related  research  in  statistics.  Section  3.3  outlines  our  strategy. 
Section  3.4  illustrates  our  method  with  circuit  examples.  The  discussion  in  Section  3.5  con¬ 
cludes  this  chapter. 


3.1.  Previous  Parametric  Yield  Prediction  Methods 

The  parametric  yield  of  a  MOS  VLSI  circuit  depends  on  a  large  number  of  factors:  the 
designable  parameters,  the  operating  conditions,  the  distribution  of  the  device  parameters,  and 
the  specified  performance  criteria.  The  designable  parameters,  such  as  the  drawn  transistor 
channel  widths  and  lengths,  are  fixed  in  parametric  yield  prediction.  (The  case  in  which 
parametric  yield  is  optimized  with  respect  to  the  designable  parameters  is  considered  in 
Chapter  4.)  The  empirical  distribution  of  the  device  parameters,  such  as  channel  threshold 
voltage  and  body  effect  coefficient,  are  usually  obtained  from  test  structure  measurements. 

We  denote  the  circuit  performances,  such  as  gain  or  propagation  delay  time,  by 
Y  =  CVi,..,  y^).  The  device  parameters  are  denoted  by  P  =  ( pj,  .  .  .  ,  p^).  To  simplify  nota¬ 
tion  we  do  not  introduce  separate  symbols  for  the  operating  conditions,  but  they  are  used  later 
in  Chapter  4. 
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Given  a  set  of  constraints  on  the  performances,  let  I  (  ...,)’(/)  =  1  if  all  the  con¬ 

straints  are  met,  and  0  otherwise.  The  parametric  yield  is  the  percentage  of  the  manufactured 
circuits  with  1=1: 

<1>  =  J  /[  yi(P),....  ^^(P)]  dQiP)  *  100%,  (3.1) 

where  0  is  the  distribution  of  P. 

Existing  methods  for  yield  prediction  usually  can  be  classified  into  Monte  Carlo 
methods  [18,22,23]  or  statistical  modeling  methods  [1,5]. 

3.1.1.  Monte  Carlo  methods 

Monte  Carlo  circuit  simulation,  or  simple  random  sampling,  is  probably  the  most  widely 
used  method  for  parametric  yield  prediction  [18,22]: 

STEP  1:  Generate  a  (large)  number  of  samples  of  the  device  parameters  P,  ,  i  =  1,..., 
from  0. 

STEP  2:  Simulate  the  circuit  performance  Y(P,  ),  for  i  =  1,...,  N/^c- 

STEP  3:  Calculate  the  predicted  yield  as  the  percentage  of  samples  that  are  acceptable: 

^  =  TT^  ^2  f  (Y(P,))  *  100%.  (3.2) 

The  confidence  limits  on  <i>  are  estimated  from  [23]: 

Prob(  I6-<DI  ^  V  a(4>) )  =  2F(  )  -  1,  (3.3) 

where 
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0(6)  = 


(3.4) 


0(1  -  <D) 

is  the  variance  of  the  yield  estimate,  and  F(  y )  is  the  cumulative  standard  Gaussian  distribu¬ 
tion.  Note  the  accuracy  of  6  is  inversely  proportional  to  the  square  root  of  the  sample  size 
Nmc- 

The  advantages  of  simple  Monte  Carlo  circuit  simulation  are 

A 

(i)  It  states  the  confidence  limits  on  the  predicted  yield,  O. 

(ii)  The  sample  size  Nj^c  independent  of  the  number  of  input  parameters.  As  a  result,  all 
the  device  parameters  can  be  varied  in  the  circuit  simulation,  at  no  extra  cost. 

(iii)  It  makes  no  restrictive  assumption  on  the  distribution  of  the  device  parameters,  0(P). 

(iv)  It  makes  no  restrictive  assumption  on  the  relations  between  the  inputs  and  outputs  of  the 
circuit  simulator. 

Due  to  the  high  cost  of  circuit  simulation,  the  Monte  Carlo  method  is  too  expensive  for  larger 
circuits.  Another  disadvantage  of  Monte  Carlo  analysis  is  that  it  does  not  give  a  physical 
relationship  between  the  device  parameters  and  yield.  Nonetheless,  we  will  treat  the  yield 
estimated  from  Monte  Carlo  circuit  simulation  as  the  "actual"  yield  to  assess  the  relative  accu¬ 
racy  of  the  other  approaches. 

Hocevar  et  al.  [23]  studied  variance  reduction  methods  to  reduce  the  sample  size.  The 
strategies  considered  are  correlated  sampling,  importance  sampling,  control  variates,  and 
stratified  sampling.  No  practical  and  general  reduction  technique  has  been  found  [23]. 
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3.1.2.  Statistical  performance  modeling  method 

Cox,  Yang,  Manhant-Shetti,  and  Chatteijee  [1]  proposed  a  starisdcal  performance  model¬ 
ing  approach.  Although  the  number  of  device  parameters  is  large,  it  has  been  shown  in  Sec¬ 
tion  2.2  that  only  a  small  subset  is  critical.  The  uncontrollable  inter-chip  variations  in  the 
critical  parameters  are  assumed  to  be  independent  random  variables.  For  convenience  they 
are  taken  as  Gaussian.  Device  parameter  variations  within  a  chip  are  assumed  to  be  negligi¬ 
ble. 

We  denote  the  four  critical  parameters  in  [1]  by  U  =  (Mj,...,  U4).  The  method  involves 
four  steps  [1]; 

STEP  1:  Select  five  points  Ui,...,U5  and  simulate  the  circuits  at  these  points.  The  device 
parameters  P  are  treated  as  functions  of  the  four  critical  parameters  (see  Sec¬ 
tion  2.2). 

STEP  2:  Assume  models 

4 

y*  =  Pok  + 

i=l 

where  Poit— iP4k  unknown  constants, 

the  approximation  ^*(U). 

STEP  3:  The  yield  body  A  is  a  region  defined  by  the  fitted  models,  U  e  A  if  Y(U)  satisfies 
the  constraints.  The  parametric  yield  is  computed  by  numerical  integration 

6=  f  dr(U)*  100%. 

UeA  ■ 


k  =  1 . d,  (3.5) 

Fit  the  data  to  model  (3.5)  to  obtain 
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where  r(P)  is  the  distribution  of  P. 


STEP  4:  For  a  more  accurate  estimate  of  O,  perform  extra  simulations  at  points  close  to  the 
boundary  of  A,  and  repeat  steps  2  and  3. 

Visvanathan  [5]  also  advocated  the  performance  modeling  approach  for  statistical  circuit 
analysis,  and  implemented  his  method  in  the  CENTER/ADVICE  program.  The  inputs  to  the 
circuit  simulator  are  denoted  by  4m-  These  4i’s  are  the  operating  conditions  and  the 
critical  process  parameters  in  [16].  The  assumed  models  are  [5] 

yjk  =  Pok  +  +  ZPiik^^  *  =  1 . (3.7) 

i=l  i=l 

where  Po^,  Pjjfc,  etc.  are  unknown  constants.  The  models  are  fitted  to  data  from  2m+l  circuit 
simulator  runs:  one  run  at  the  nominal  values  of  .  4m »  2m  runs  where  each  4/  is 
varied  to  its  "extreme"  values. 

The  circuit  performance  modeling  method  [1,5]  is  attractive  because  it  requires  only  a 
small  number  of  circuit  simulator  runs,  and  the  fitted  models  estimate  the  sensitivity 
coefficients  dyi^ldui.  However,  these  aspects  in  statistical  modeling  are  not  considered 
in  [1,5]: 

(a)  The  statistical  design  of  experiments  have  not  been  used  to  select  the  inputs  to  run  the 
circuit  simulator  (see  Appendix  A). 

(b)  The  models  are  fitted  from  the  minimum  number  of  runs,  and  no  attempt  is  made  to 
assess  their  goodness  of  fit  (see  Appendix  B). 
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(c)  The  effects  of  representing  the  device  parameters  as  functions  of  four  critical  MOSFET 


parameters  are  not  known. 


3.2.  Uncertainty  Analysis 

In  statistics,  the  analysis  of  the  responses  (outputs)  of  r.  computer  code  when  the  inputs 
are  subjected  to  statistical  variations  is  called  uncertainly  analysis  [19],  or  sensitivity 
analysis  [20,21].  Parametric  yield  prediction  can  be  considered  as  an  uncertainty  analysis 
problem.  The  circuit  performance  models  in  [1]  are  commonly  known  as  response  surface 
models  [24].  The  equivalent  terminologies  in  VLSI  design  and  statistics  are  listed  in  Table 
3.1. 

Iman,  Helton,  and  Campbell  [20,21]  considered  the  sensitivity  analysis  of  a  computer 
code  known  as  the  "pathways-to-man"  model.  Like  parametric  yield  prediction,  their  goal  is 
to  predict  from  a  known  distribution  of  the  inputs,  the  distribution  of  the  responses  from  the 
computer  code.  They  suggested  the  following  analysis  procedure:  (i)  Identify  a  (small)  sub¬ 
set  of  critical  inputs  by  Latin  hypercube  sampling  [25].  (ii)  Construct  response  surface 
models  of  the  outputs  by  the  stepwise  regression  and  rank  regression  methods  [17].  (iii)  Gen¬ 
erate  the  distribution  of  the  responses  by  Monte  Carlo  simuladoo  with  the  fitted  (response  sur¬ 
face)  models. 

Downing.  Gardner,  and  Hoffman  [19]  also  examined  the  response  surface  approach  for 
the  uncertainty  analysis  of  computer  codes.  Their  analysis  method  is  similar  to  the  one 
in  [20]:  (i)  The  critical  inputs  are  identified  by  a  fractional  factorial  experiment  [76]  that 
varies  all  the  input  parameters,  (ii)  Each  response  of  interest  is  approximated  by  c  i  unction  of 
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Table  3.1  Equivalent  terminology  in  yield  prediction  and  the 
response  surface  method. 


VLSI  Design 


Statistics 


1.  circuit  simulation 

2.  circuit  simulation  plan 

3.  circuit  performance 

4.  inputs  to  the  circuit  simula¬ 

tor 

5.  designable  parameters 

6.  device  parameters 

7.  macro-model,  statistical  per¬ 
formance  model 


computer  experiment 

design  of  experiment 

response,  output  variable 

predictors,  input  variables, 
experimental  factors 

controllable  factors 

noise  factors,  uncontrollable 
parameters,  statistical  varia¬ 
tions 

empirical  model,  regression 
model,  response  surface 
model 
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the  critical  inputs,  while  the  non-critical  inputs  to  the  simulation  code  are  fixed  at  their 
nominal  values.  Typically  the  approximating  function  is  a  second-order  polynomial,  (iii) 
The  approximating  function  is  fitted  to  data  collected  according  to  a  central  composite 
design  [26].  The  goodness  of  fit  of  a  function  to  data  is  assessed  by  the  /?^-statistic  (see 
Appendix  B).  The  study  in  [19]  showed  that  the  response  surface  approach  predicts  the 
distributions  of  the  responses  fairly  accurately  in  certain  examples.  However,  it  is  emphasized 
that  response  surfaces  should  be  used  with  caution,  because  it  is  difficult  to  evaluate  the 
effects  of  dropping  the  non-critical  inputs. 


3.3.  Proposed  Parametric  Yield  Prediction  Method 

The  proposed  method  combines  the  device  parameter  modeling  method  [1]  and  the 
response  surface  method  [24].  The  statistical  MOSFET  models  (device  parameter 
equations)  [1]  allow  us  to  reduce  the  dimension  of  the  problem  without  compromising  the 
accuracy  of  the  yield  prediction.  We  prefer  the  systematic  response  surface  method  over  the 
empirical  curve-fitting  strategy  in  [1]. 

Denote  the  critical  parameters  by  U  and  the  circuit  performances  by  y^^,  k  =  l,...,d.  The 
broad  strategy  of  our  yield  prediction  algorithm  involves  four  steps: 

STEP  1:  Assume  model 

yjk  -/k  (V)  +  error,  k  =  (3.8) 

for  each  circuit  performance  of  interest,  y*. 
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STEP  2:  Design  an  experiment  and  simulate  the  circuits  at  the  experimental  design  points. 


The  non-critical  device  parameters  are  treated  as  functions  of  the  critical  inputs. 

STEP  3;  Fit  the  models  (3.8)  to  obtain  the  approximation  y*  (U),  and  improve  the  model  if 
necessary. 

STEP  4;  The  parametric  yield  is  estimated  by  Monte  Carlo  simulation.  First  generate  a 
Oarge)  sample,  Uj,  i  =  1,...,  front  the  distribution  of  U.  Then  compute 

yi(U.).  i  =  k  =  rf.  (3.9) 

The  estimated  parametric  yield  is  the  percentage  of  samples  that  are  acceptable: 

,  1 

<I>  =  TT^  I  /(yi(U, •),...,  yM))  *  100%.  (3.10) 

^MC  ,=1 

Detailed  implementation  of  our  experimental  design  and  model  assessment  pro¬ 
cedures  is  given  in  the  circuit  examples. 

3.4.  Yield  Prediction  Examples 

Our  method  for  statistical  performance  modeling  and  parametric  yield  prediction  is  illus¬ 
trated  through  the  following  examples. 

Example  I.  NMOS  chain  of  Inverters 

We  consider  a  chain  of  20  NMOS  inverters.  The  goals  are  to  assess  the  adequacy  of  the 
statistical  MOSFET  model  with  four  critical  parameters  and  to  fit  an  accurate  prediction 
model  of  the  performances.  The  modeled  performances  arc 
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yi  =  the  power  dissipation  [27], 


}’2  =  the  output  delay  time. 

The  critical  parameters  are 

Uj  =  channel  width  reduction  (AIV), 

U2  =  channel  length  reduction  (AL), 

U3  =  gate  oxide  capacitance  (C^), 

U4  =  channel  flatband  voltage  (Vy^). 

The  factors  U4  are  normalized  to  the  range  and  their  distribution  functions  are 

four  independent  Gaussian  variables  with  mean  zero,  and  a  =  1/2.  We  denote  the  four¬ 
dimensional  experimental  region  [-1,1]^  by  /?. 

The  assumed  model  is 

4 

=  Pok  +  ZPik“i  +  ^  =  1.2,  (3.11) 

«=i 

where  Pot.  P«k.  unknown  regression  coefficients. 

Denote  the  design  points  by  U,-,  i  =  1,...,  Nq^^.  At  a  design  point  U,-,  the  device  param¬ 
eter 

Pij  “^yCU|)  j  ~  lf—»<7>  (3.12) 

where  gj  is  a  systematic  function  that  approximates  the  dependency  of  a  parameter  on  U,  and 
eij  is  a  Gaussian  random  number  that  models  the  variation  not  determined  by  U.  We 
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duced  this  random  error  into  the  experiment  to  assess  the  sufficiency  of  Ui,...,  u^.  In  [1] 
there  is  no  residual  random  variation,  and  =0. 

The  number  of  runs  in  the  experiment  is  fairly  arbitrary:  a  minimum  of  5  runs  is  needed, 
and  3  (replicated)  runs  are  taken  at  the  center  of  the  design  region  R  to  assess  potential  lack 
of  fit.  The  ACED  package  [28]  was  used  to  obtain  the  design  (see  Section  A.l).  This  package 
can  construct  experiments  according  to  various  optimality  criteria.  The  mean-squared  error 
criterion  in  ACED,  which  addresses  both  the  sampling  error  and  bias  in  prediction  arising 
from  model  inadequacy,  is  appropriate.  The  robust  design  criterion  that  chooses  a 
compromise  between  the  sampling  and  bias  error  is  used.  The  ACED  generated  experimental 
design  is  listed  in  Table  3.2,  along  with  the  data. 


Table  3.2  Experimental  design  and  data  of  inverter  chain  (four  critical  parameters). 


Run 

“i 

“2 

«3 

“4 

yi  irnW) 

yi  (ns) 

1 

-1 

-1 

-1 

+1 

1.62 

24.1 

2 

-i 

+1 

+1 

+1 

3.08 

10.8 

3 

-1 

+I 

+1 

-1 

3.50 

10.1 

4 

0 

0 

0 

0 

2.32 

15.2 

5 

0 

0 

0 

0 

2.35 

15.8 

6 

0 

0 

0 

0 

2.22 

16.7 

7 

+1 

-1 

-1 

-1 

1.69 

21.5 

8 

+1 

-1 

+\ 

+1 

2.10 

17.6 

9 

+1 

+1 

-1 

+1 

1.83 

19.7 
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We  fit  the  data  to  model  (3.11)  with  the  RSREG  procedure  in  SAS  [29]  and  assess  the 
prediction  capability  of  fitted  model,  yj  and  >2*  F-test  procedure  in  [30]  (see 

Section  B.l).  Tables  3.3  (a)  and  (b)  list  the  regression  statistics,  along  with  the  fitted  equa¬ 
tions.  The  F-statistics  estimate  the  ratios  between  the  variations  "explained"  by  a  model  and 
the  error.  They  are  used  to  test  if 

(a)  the  "range"  of  values  predicted  by  a  fitted  model  is  substantially  larger  than  the  standard 

error, 

(b)  the  lack  of  fit  is  insignificant. 

First,  condition  (a)  is  checked  at  a  significance  level  a  =  0.05,  whether  the  variation 
"explained"  by  yi  is  at  least  ^  times  larger  than  its  standard  error.  In  this  case,  the  F- 
statistic  of  the  fitted  model  (F  i  =  148.86)  is  substantially  larger  than  the  critical  value  of  the 
corresponding  nonccntral  F-distribution  (Fj  ^r  =  F'o.05;4.4,4  “  29.8).  This  suggests  the  fitted 
linear  equation  of  u^,...,  U4  adequately  accounts  for  the  variations  in  yj. 

Next,  we  check  condition  (b)  for  the  adequacy  of  Eq.  (3.11).  The  lack  of  fit  F-statistic 
(F2  =  1.5)  is  substantially  smaller  than  the  critical  value  of  the  corresponding  F-distribution 
(F2,cr  =  F'o.o3;2.2  =  19.0),  Suggesting  the  lack  of  fit  is  insignificant  We  conclude  the  model  is 
adequate  (case  1  in  Section  B.l).  Following  the  same  procedure,  we  also  conclude  that  the 
fitted  delay  equation  is  adequate. 

The  model  predicted  a  maximum  (worst-case)  power  of  3.5  mW  in  R,  with  a  95% 
confidence  limit  of  ±0.2  mW.  At  this  point,  the  parametric  yield  can  be  estimated  by  Monte 
Carlo  simulation  with  the  fitted  models.  This  procedure  is  described  in  the  next  example. 
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Table  3.3  (a)  ANOVA  table  for  the  power  of  inverter  chain  (four  critical  parameters). 


Source  SS  df  MS  F-ratio  Test  Statistic 


Model  3.122  4  0.781  Fj  =  148.8  =29.8 

Error  0.021  4  0.005 


Lack  of  Fit  0.0126  2  0.0065  F2=  1.125  F2,cr  =  19.0 

Pure  Error  0.0083  2  0.0042 

Si  =  2.35  -  0.19ui  +  0.29u2  +  0.43^3  -  0.21^4 

Table  3.3  (b)  ANOVA  table  for  the  delay  of  inverter  chain  (four  critical  parameters). 

Source  SS  df  MS  F-ratio  Test  Statistic 

Model  163.5  4  40.88  Fi=32.3  Fj,^,  =  29.8 

Error  5.06  4  1.265 


Lack  of  Fit  3.912  2  1.956  Fj  =  3.412  Fj.^^  =  19.0 

Pure  Error  1.147  2  0.573 


Sj  =  16.6  -I-  0.3ui  -  2.5u2  -  3.6M3  +  1.2^4 
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Example  2.  Critical  Path  of  32 -Bit  Domino  CMOS  ALU 

The  second  example  is  the  critical  path  of  a  32-bit  domino  CMOS  ALU  circuit  shown  in 
Figure  3.1  [31].  The  circuit  performances  of  interest  are: 

1  =  the  power  dissipation, 

=  the  output  delay  time. 

Table  3.4  shows  the  experimental  design  chosen  by  ACED,  along  with  the  data.  The 
assumed  model  is  identical  to  (3.11).  Tables  3.5  (a)  and  (b)  show  the  regression  statistics, 
along  with  the  fitted  equations. 

In  the  case  of  power  dissipation,  the  F-test  show  that  the  range  of  values  predicted  by  y  j 
is  much  larger  than  the  error  iF^  =  36.1  >  =  29.8),  and  the  lack  of  fit  is  insignificant 

(^2  =  2.0  <  F2,cr  =  19.0).  The  test  results  suggest  the  model  is  adequate  (case  1  in 
Section  B.l). 

In  the  case  of  output  delay  time,  the  F-tests  show  that  the  range  of  values  predicted  by 
yj  is  not  substantially  larger  than  the  standard  error  (Fj  =  20.9  <  Fi  ^r  =  29.8),  and  the  lack 
of  fit  is  insignificant  (F2  =  1.2  <  F2_^  =  19.0).  We  conclude 

•  The  fitted  linear  equation  of  Uj,  U2,  Uj,  u^,  is  not  an  adequate  predictor  of  the  delay  time. 

•  The  linear  equation  should  approximate  the  actual  delay  adequately. 

•  The  error  due  to  uncertainties  in  the  fitted  model  is  important. 


25 


Figure  3. 1 


Critical  path 


througl, 
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Table  3.4  Experimental  design  and  data  of  ALU  (four  critical  parameters). 


Run 

Ml 

U2 

M3 

M4 

>1  (mW) 

>2  ('w) 

1 

-1 

-1 

-1 

+1 

6.12 

80.7 

2 

-1 

+1 

+1 

+1 

7.32 

51.0 

3 

-1 

+1 

+1 

-1 

5.67 

40.8 

4 

0 

0 

0 

0 

6.04 

62.5 

5 

0 

0 

0 

0 

6.20 

65.1 

6 

0 

0 

0 

0 

5.75 

55.6 

7 

+1 

-1 

-1 

-1 

5.12 

80.8 

8 

+1 

-1 

+1 

+1 

9.16 

96.2 

9 

+1 

+1 

-1 

+1 

6.25 

61.2 

I 

I  Therefore,  we  conclude  that  either  the  number  of  observations  Nqbs  too  small  or  that 

[  the  set  of  critical  parameters  used  is  incomplete  (case  2  in  Section  B.l). 

It  is  observed  that  the  domino  CMOS  circuit  uses  a  long  chain  of  NMOS  transistors  con¬ 
nected  in  series;  the  effect  of  the  back-gate  bias  on  the  output  delay  time  should  be 
significant  However,  the  substrate  doping  which  strongly  affects  the  body-effect 

coefficient,  is  not  included  in  the  four-parameter  model  (3.11).  Therefore,  we  include  a  fifth 
critical  MOSFET  parameter 


W5  =  substrate  doping  (N^ub)  • 


The  assumed  model  is 


5 

yi  =  Po*  +  ZPiik“i  + 


»=i 


k  =  1,2, 


(3.13) 
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Table  3.5  (a)  ANOVA  table  for  the  power  of  ALU  (four  critical  parameters). 


Source  SS  df  MS  F-ratio  Test  Statistic 

Model  11.02  4  2.76  ^1  =  36.1  =29.8 

Error  0.305  4  0.08 

Lack  of  Fit  0.202  2  O.IOI  F2  =  2.0  F2.cr  =  19.0 

Pure  Error  0.103  2  0.051 

y  1=  6.19  +  0.46ui  -  0.39^2  +  I.O6U3  +  0.95m4 


Table  3.5  (b)  ANOVA  table  for  the  delay  of  ALU  (four  critical  parameters). 


Source  SS  df  MS  F-rado  Test  Statistic 

Model  2243.0  4  560.8  Fi=20.9  Fj,^^  =  29.8 

Error _ 107.2  4  26.8 _ 

Lack  of  Fit  58.5  2  29.3  F2  =  1.2  F2.cr  =  19  0 

Pure  Error  48.7  2  24.4 

>2  =  64.5  +  6.2ui  -  15.9w2  +  I.6M3  +  6.4u4 
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Tabic  3.6  Experimental  design  and  data  of  ALU  (five  critical  parameters). 


Run 

“1 

«2 

U3 

“4 

“5 

yi  imW) 

yi  (^) 

1 

-1 

-1 

-t-1 

-1 

-1 

5.76 

61.0 

2 

-1 

-1 

-1-1 

-t-1 

-1-1 

7.42 

71.4 

3 

-1 

-n 

-1 

-1 

-1 

4.76 

46.9 

4 

-1 

-n 

-1 

-1-1 

-t-1 

6.30 

57.2 

5 

0 

0 

0 

0 

0 

6.40 

68.4 

6 

0 

0 

0 

0 

0 

5.78 

59.4 

7 

0 

0 

0 

0 

0 

6.29 

65.1 

8 

-t-l 

-1 

-1 

-1 

-1-1 

5.27 

90.6 

9 

+1 

-1 

-1 

-t-1 

-1 

6.30 

87.5 

10 

-1-1 

-t-1 

-t-1 

-1 

-1-1 

5.97 

52.0 

11 

-t-l 

-t-l 

-1-1 

-t-1 

-1 

7.10 

50.6 

An  experimental  design  for  u^,  ...  ,u^  with  =  H  runs  is  generated  by  AGED. 

Table  3.6  shows  the  experimental  design,  along  with  the  data.  The  fitted  models  and  the 
regression  statistics  are  shown  in  Table  3.7. 

In  this  case  =  46.7  is  larger  than  Fi  „'  =  F'qq^.^^^  =  23.2,  and  F2  =  0.02  is  smaller 
than  F 2,cr  -  ^0.05:3,3  =  19.0  for  the  delay,  which  indicates  that  the  delay  model  is  now  ade¬ 
quate  with  five  parameters.  It  should  be  noted  that  the  statistical  variation  of  Nsub  is  technol¬ 
ogy  dependent  and,  for  some  technology,  the  variation  of  may  be  negligible. 

The  distribution  of  the  output  delay  time,  obtained  from  10,000  Monte  Carlo  runs  of  the 
fitted  model,  is  shown  in  Figure  3.2.  The  number  =  10,000  is  chosen  simply  to  ensure  a 
sufficient  number  of  samples.  The  computer  time  to  evaluate  10,000  sample  points  with  the 
fitted  models  is  very  small  compared  with  the  time  for  the  circuit  simulation.  An  empirical 
distribution  of  the  delay,  obtained  from  300  runs  of  the  circuit  simulator  with  all  the  input 
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Table  3.7  (a)  ANOVA  table  for  the  power  of  ALU  (five  critical  parameters) 


Source 

SS 

df 

MS 

F-ratio 

Test  Statistic 

Model 

5.44 

5 

1.09 

F,  =  23.8 

Fur  =23.2 

Error 

0.23 

5 

0.05 

Lack  of  Fit 

0.012 

3 

0.004 

F2  =  0.04 

F2.cr  =  19.2 

Pure  Error 

0.217 

2 

0.109 

>2  =  6.12  40.05^1  -0.08a2  +0.45u3  40.67u4  +0.13u5 


Table  3.7  (b)  ANOVA  table  for  the  power  of  ALU  (five  critical  parameters). 
Source  SS  df  MS  F-ratio  Test  Statistic 

Model  1978.2  5  395.6  Fi=46.7  =23.2 

Error _ 42.4  5  8.9 _ 

Lack  of  Fit  1.1  3  0.4  F2  =  0.02  F2,cr  =  19.2 

Pure  Error  41.3  2  20.7 


>'2  =  ^-5  +5.5wi  -  13.0^2  ~  5.9^3  +  2.OU4  +3.2a5 
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MOSFET  parameters  varying,  is  shown  in  Figure  3.3.  The  moments  and  percentage  points  of 
the  two  distributions  are  listed  in  Table  3.8.  We  use  the  distribution  from  the  300  circuit 
simulator  runs  as  the  "true"  delay  distribution  because  it  is  the  best  approximation  that  is 
available. 

The  fitted  model  predicts  the  distribution  of  the  delay  accurately.  The  differences 
between  the  means  and  standard  deviations  of  the  delay  distributions  are  small  (5%).  How¬ 
ever,  the  distribution  from  10,000  runs  with  the  model  is  symmetric  and  normal  whereas  the 
distribution  from  300  circuit  simulator  runs  has  a  skewness  of  0.8.  These  discrepancies  can 
be  explained  in  part  by  the  higher-order  effects  not  modeled  in  (3.13)  and  declared 
insignificant  by  the  lack  of  fit  test. 

The  differences  between  the  75,  90,  95  and  99  percentile  points  of  the  delay  distributions 
are  consistently  less  than  10%,  and  there  is  a  nr.aximum  error  of  +5.9%  at  the  99  percentile. 

The  parametric  yields  for  various  delay  constraints  are  estimated  from  the  performance 
distributions.  The  yield  difference  between  the  regression  model  and  circuit  simulator  cased 
simulation  for  delay  constraints  of  65  ns,  70  ns  and  75  ns  is  consistently  less  than  12%. 

Example  3.  CMOS  4-Bit  Full  Adder 

In  this  example  we  consider  a  CMOS  4-bit  full  adder  consisting  of  112  transistors.  The 
mode'ed  performance  is 

y  =  the  output  delay  time  of  the  most  significant  bit. 

The  five  critical  parameters  Uj,....,  Uj  are  identical  to  those  in  Example  2.  AGED  is  used  to 
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Table  3.8  Delay  distribution  of  ALU. 


Monte  Carlo  with 

Monte  Carlo  with 

Circuit  Simulator 

Fitted  Model 

No.  samples 

300 

10000 

Mean 

61.4 

64.5 

Std.  Dev. 

8.96 

8.48 

Variance 

80.2 

71.9 

Skewness 

0.81 

0.0 

Kurtosis 

1.05 

0.0 

Median 

60.3 

64.5 

75  percentile 

66.3 

70.2 

90  percentile 

73.1 

75.4 

95  percentile 

77.6 

78.5 

99  percentile 

90.0 

84.8 

^65  t 

75% 

64% 

970 

87% 

83% 

^75 

95% 

94% 

t  ^  the  predicted  parameteric  yield  when  the 

delay  constraint  is  7^,^  (ns). 


design  a  16-run  experiment  of  5  factors.  The  delay  distributions  generated  by  10,000  runs 
with  the  fitted  model  and  300  runs  of  the  circuit  simulator  varying  all  the  device  parameters 
are  compared  in  Table  3.9.  It  can  be  observed  that  the  errors  in  the  worst-case  delay  estima¬ 
tion  are  consistently  less  than  5%  and  the  differences  in  yield  estimates  are  less  than  8%. 


3.5.  Concluding  Remarks 

In  this  chapter,  an  improved  circuit  performance  modeling  method  for  parametric  yield 
prediction  is  presented.  It  is  shown  that  our  method  predicts  yield  accurately  with  a  relatively 
small  number  of  circuit  simulator  runs. 

We  introduced  the  response  surface  method  for  the  building  of  the  circuit  performance 
models.  Our  method,  based  on  the  design  and  analysis  of  experiments,  allows  accurate 
models  of  the  circuit  performances  to  be  fitted  from  a  small  number  of  circuit  simulator  runs. 
Experimental  design  also  allows  the  assessment  of  the  predictive  capabilities  of  the  fitted 
models  by  a  statistical  F-test  procedure. 

A  systematic  procedure  is  introduced  to  verify  whether  the  four  critical  parameters 
indeed  cause  most  of  the  variations  in  the  circuit  performances.  Two  statistical  F-tests  are 
used  to  compare  the  variations  due  to  the  critical  parameters  with  the  variations  due  to  other 
sources.  As  demonstrated  through  the  32-bit  ALU  circuit  example,  the  four  critical  parame¬ 
ters  {AW,  AL,  Vfi, }  may  not  be  always  sufficient.  In  this  example  engineering  insight 
has  been  used  to  identify  the  fifth  critical  parameter  N^ub  •  This  empirical  method  of  parame¬ 
ter  screening  may  be  avoided  if  the  systematic  screening  methods  were  used  (see  Section  2.3). 
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Table  3.9  Delay  distriburion  of  four-bit  full  adder. 


Monte  Carlo  with 

Monte  Carlo  with 

Circuit  Simulator 

Fitted  Model 

No.  samples 

300 

10000 

Mean 

78.3 

81.3 

Std.  Dev. 

13.3 

14.9 

Variance 

176.3 

220.6 

Skewness 

0.45 

0.0 

Kurtosis 

0.0 

0.0 

Median 

78.1 

81.4 

75  percentile 

86.8 

91.3 

90  percentile 

97.6 

100.3 

95  percentile 

101.9 

105.5 

99  percentile 

114.6 

115.2 

4*80 

55% 

47% 

4*90 

81% 

73% 

^100 

93% 

90% 

A 

t  is  the  predicted  parameteric  yield  when  the 
the  delay  constraint  is  7^,^^  (ns). 
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Based  on  our  experience  with  these  and  other  examples,  a  linear  equation  often  adequate¬ 
ly  approximates  the  performance  of  digital  circuits.  With  a  sufficient  set  of  critical  parameters 
and  careful  modeling,  the  parametric  yield  and  the  distribution  of  the  performances  can  be 
predicted  accurately. 

In  the  rest  of  the  thesis  we  will  assume  the  critical  parameters  to  be  sufficient.  The  other 
device  parameters  will  be  treated  as  deterministic  functions  of  the  critical  parameters  only. 
This  eliminates  the  sampling  errors  from  the  fitted  models.  The  design  of  computer- 
simulation  experiments  that  has  no  random  error  is  considered  in  Section  A.2. 


CHAPTER  4. 


PARAMETRIC  YIELD  OPTIMIZATION 


In  this  chapter  we  extend  the  circuit  performance  modeling  method  to  parametric  yield 
optimization.  Like  parametric  prediction,  the  bottleneck  in  yield  optimization  is  the  cost  of 
the  circuit  simulation.  Over  the  past  years  many  approaches  have  been  proposed  for  yield  op¬ 
timization,  typically  many  circuit  simulator  runs  are  required,  and  their  practical  applications 
are  limited. 

Yield  gradient  methods  for  parametric  yield  maximization  [6,32,33]  compute  gradients 
with  respect  to  the  designable  circuit  parameters  and  then  use  steepest  ascent  to  optimize  the 
yield.  Gradients  only  represent  the  local  behavior  of  the  yield  function,  however,  and  a  possi¬ 
bly  poor  local  maximum  may  be  found  (unless  the  yield  function  is  simple).  Yield  gradient 
methods  typically  require  a  large  number  of  circuit  simulations.  Data  from  many  runs  of  the 
circuit  simulator  are  required  to  estimate  a  single  yield  gradient,  and  several  gradient  compu¬ 
tations  are  needed  in  the  optimization. 

In  our  approach,  we  model  each  circuit  performance  as  a  function  of  all  parameters  of 
interest:  the  designable  parameters,  the  statistical  variations,  and  the  operating  conditions. 
Data  generated  according  to  a  statistical  experiment  are  used  to  identify  and  fit  these  models. 
For  fixed  values  of  the  designable  parameters,  the  Monte  Carlo  yield  is  estimated  with  the  ap¬ 
proximating  models  acting  as  computationally  cheap  surrogates  for  the  circuit  simulator,  as  in 
Chapter  3.  This  estimated  yield  is  numerically  optimized.  Section  4.1  outlines  the  steps  in 
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our  strategy,  and  Section  4.3  illustrates  the  detailed  implementation  in  two  CMOS  analog  cir¬ 
cuit  examples.  Finally,  Section  4.4  presents  some  concluding  remarks. 

4.1.  Proposed  Parametric  Yield  Maximization  Method 

The  parametric  yield  of  a  MOS  VLSI  circuit  depends  on  a  large  number  of  factors:  the 
designable  parameters,  the  operating  conditions,  the  distribution  of  the  uncontrollable  statisti¬ 
cal  variations,  and  the  specified  performance  criteria.  The  designable  circuit  parameters,  such 
as  the  drawn  transistor  channel  lengths  and  the  aspect  ratios  of  the  P  and  N-channel  transis¬ 
tors,  can  be  specified  by  the  circuit  designers.  We  model  the  distribution  of  the  device  param¬ 
eters  by  a  small  subset  of  critical  parameters,  as  in  Chapter  3.  The  device  parameters  are 
treated  as  functions  of  the  critical  parameters  only,  and  there  is  no  randomness  in  the  experi¬ 
ment. 

We  denote  the  circuit  performances  of  interest,  such  as  gain  or  propagation  delay  time, 
by  Y  =  (yi,..,y^).  Let  X  =a:i,  .  .  .  ,x^)  denote  the  varying  input  parameters  to  the  circuit 
simulator,  all  other  inputs  remaining  fixed.  In  circuit  simulation,  each  can  be  manipulated 
to  represent  controllable  adjustment  of  a  designable  circuit  parameter  and/or  uncontrollable 
statistical  variation.  We  write  Xi  =  c,  +  to  differentiate  between  the  controllable  and 
uncontrollable  portions.  If  an  input  parameter  has  no  designable  adjustment,  then  c,  has  a 
fixed  value  and  is  ignored.  Similarly,  if  there  is  no  uncontrollable  variation,  then  u,  =  0. 
Each  y^  is,  therefore,  a  function  of  X  =  C -i- U,  where  C  =  (Ci,  .  .  .  ,Cp)  and 

U  =  (Wi . Up). 
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Given  a  set  of  constraints  on  the  performances,  let  I(yi,  .  .  .  =  1  if  all  the  con¬ 

straints  are  met,  and  0  otherwise.  For  given  controllable  parameters  C,  the  parametric  yield  is 
the  percentage  of  the  manufactured  circuits  with  1  =  1: 

0(C)  =  J  /[  yi(C+U) . yrf(C-^U)]  dr(U)  *  100%,  (4.1) 

where  F  is  the  distribution  of  U. 

The  broad  strategy  for  yield  maximization  involves  six  steps.  The  examples  will  illus¬ 
trate  details  of  their  implementation. 

STEP  1:  Assume  models  for  the  performances  as  functions  of  the  circuit  simulator 
inputs  X  =  C  U: 

y*  =/jk(X)  +  error.  (4.2) 

The  circuit  simulator  is  deterministic,  and  the  error  term  in  (4.2)  represents  sys¬ 
tematic  departure  of  the  assumed  model  (X)  from  the  actual  performance. 

STEP  2:  Design  an  experiment  and  simulate  the  circuits  at  the  experimental  design  points. 

STEP  3:  Fit  the  models  (4.2)  to  obtain  the  approximation  yjfc(X).  Check  if  the  model  fits 
well,  and  improve  the  model  if  necessary.  For  example,  a  circuit  performance  that 
is  too  complex  for  accurate  modeling  can  sometimes  be  identified  as  a  composite 
function  of  several  subcircuit  performances;  this  is  exploited  in  Section  4.2. 

STEP  4:  The  parametric  yield  of  a  circuit  design  C,  defined  in  Eq.  (4.1),  is  estimated  by 
Monte  Carlo  simulation.  First,  generate  a  (large)  sample,  U,,  i  =  l,...A^jv/c> 
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the  distribution  F  of  the  uncontrollable  variations.  Then,  compute 

y„iC  +  Ui).  i  =  (4.3) 

The  estimated  parametric  yield  is  the  percentage  of  samples  that  are  acceptable: 

W)  =  S  /[  J?i(C  +  U.) . yrf(C  +  U.)  ]  *  100%,  (4.4) 

^AfC  i=l 

where  /  is  defined  in  Eq.  (4.1). 

STEP  5:  Maximize  the  estimated  yield  d>(C)  with  respect  to  C,  and  denote  the  resulting  cir¬ 
cuit  design  by  C*.  We  use  the  same  Monte  Carlo  sample  at  each  iteration  of  the 
optimization  algorithm.  Because  4>(C)  is  not  necessarily  a  smooth  function  of  C, 
the  simplex  optimization  algorithm  [34]  is  used  for  optimization. 

STEP  6:  Obtain  a  "confirmatory"  estimate  of  the  yield  at  C*.  We  use  the  method  described 
in  [3]  which  predicts  the  yield  at  C*  fix)m  a  new,  smaller  experiment.  The  use  of 
the  circuit  simulator  for  a  Monte  Carlo  estimate  of  the  yield  at  C*  would  require 
an  impractical  number  of  runs. 

In  the  examples  below,  at  steps  1  and  3  we  use  quadratic  regression  models  fit  to  the 
data  by  least  squares.  At  step  2  the  inputs  are  selected  by  a  Latin  hypercube  -  such  designs 
were  suggested  by  [25]  for  computer-simulation  experiments.  These  simple  tools  lead  to  ade¬ 
quate  accuracy  here. 

Polynomial  models  may  not  fit  so  well  in  other  examples  with  more-complex  perfor¬ 
mance  functions.  Also,  the  errors  in  model  (4.2)  are  systematic  and  not  the  random  errors 


usually  assumed  when  fitting  by  least  squares  [17].  Alternative  implementations  for  the 
modeling,  experimental  design,  and  estimation  steps  are  described  in  [35]. 

4.2.  Application  to  Yield  Maximization  of  CMOS  Analog  Circuits 
Example  A.  CMOS  comparator  / 

A  CMOS  two-stage  comparator  circuit  is  shown  in  Figure  4.1  [36].  The  performances 
of  interest  and  their  initial  constraints  are 
yj  =  dc  gain  >  1000), 

>2  =  propagation  delay  time  (T i  <  170  ns), 

>3  =  propagation  delay  time  (7'2  <  170  ns). 

The  transient  analysis  of  a  comparator  circuit  is  shown  in  Figure  4.2.  The  input  voltages  are 
denoted  by  and  V_,  and  the  output  is  denoted  by  .  The  time  intervals  T j  and  72  are 
the  differences  between  the  switching  times  of  the  input  V^.  and  the  output  voltage  • 

The  parameters  used  to  represent  the  process  variations  and  their  [-3a,+3a]  ranges  are 
Uj  =  PMOS  channel  length  reduction  (0.1  pm  <  ALp  <  0.7  pm), 

U2  =  PMOS  flatband  voltage  (0.5  V  <  Vf^,p  <  0.8  V), 

M3  =  NMOS  channel  length  reduction  (0.1  pm  <  ALf^  <  0.7  pm), 

U4  =  NMOS  flatband  voltage  (-0.6  V  S  ^  -0.4  V), 

M5  =  gate  oxide  thickness  (37  nm  ^  <  43  nm ), 

M5  =  bias  current  variation  (-3  pA  <  Alg  <  -t-3  pA ). 
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Figure  4.2  Transient  response  of  CMOS  comparator. 
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In  addition  there  is  a  single,  yet  very  demanding,  operating  condition; 

V  =  input  dc  voltage,  Vqc  (  ^dc  =  0,  or  3.5  V). 

For  a  circuit  to  be  satisfactory,  it  must  satisfy  the  constraints  for  (yj,  >'2»  >”3)  levels 

of  V. 

To  maximize  the  yield,  we  identify  four  designable  parameters  and  determine  their 
ranges  empirically: 

Cg  =  nominal  bias  current  (40  |XA  <Iq  <  60  iM ), 

C7  =  width  of  transistors  Ml  and  M2  (150  \m  <  W(M1,  M2)  <  300  )im), 

Cg  =  width  of  transistor  M7  (30  [im  <  W(M7)  <  60  [im), 

C9  =  ratio  of  the  widths  of  M7  and  M6  (1.6  <  W(M6)/W(M7)  <  2.2). 

The  parameter  ranges  are  normalized  as  follows:  Wj,...,  U5,  c^,  Cg,  and  C9  to  [-1,+1],  and  v  to 
the  discrete  values  {-1,  0,  1}.  The  bias  current  is  made  up  of  the  nominal  value,  Cg,  and  the 
variability,  U(,;  the  total  range  of  I g  =  +  Ug  is  37  <  63  ,  which  is  then  normal¬ 

ized  to  [-I.3,-(-1.3I.  The  distribution  of  U  is  that  of  six  independent  Gaussian  variables  with 
mean  zero  and  standard  deviation  a  =  1/3. 

Denote  (Uj,  .  .  .  ,  u^,  c^  +  C7,  Cg,  C9,  v)  by  (X],  ...,  Xjq).  The  assumed  models  arc 

10  10 10 

yk  =  Pot  +  Z  +  ZZ  ^ijkXiXj+error,  k  =  1,2,3.  (4.5) 

<=1  i=\j=i 

The  constants  are  (unknown)  regression  coefficients. 

A  100-observation  Latin  hypercube  experiment  [25]  is  generated  for  the  circuit  simula¬ 
tion:  a  minimum  of  66  runs  is  required  to  fit  the  model  (4.5),  and  the  remaining  34  degrees 
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of  freedom  permit  some  assessment  of  fit.  ^^tin  hypercube  designs  were  developed  for 
computer-simulation  experiments  and  are  easy  to  construct.  The  experimental  design  is  gen¬ 
erated  as  follows.  Each  x,,  except  for  Xf  and  x^q,  takes  the  equally  spaced  values 
-99/100,  -97/100,...,  99/100,  for  the  10^  runs,  but  in  different  random  orders.  Similarly,  JCg  is 
generated  in  [-1.3,-t-1.3]  and  ;tio  set  {-1,0,1}.  Thus,  each  variable  is  exercised  over  its 
range,  and  the  variables  are  matched  at  random.  The  circuit  descriptions  are  generated  from 
the  experimental  design  by  lEDISON  [37],  and  SPICE  is  used  for  circuit  simulation. 
Table  4.1  lists  part  of  the  experimental  design  and  the  data. 

For  each  of  the  three  performances,  the  unknown  coefficients  in  the  model  (4.5)  are 
estimated  by  least  squares  based  on  the  100  observed  runs.  Since  there  is  no  random  error  in 
the  circuit  simulation,  statistical  testing  of  the  fitted  model  is  inappropriate.  Nonetheless,  we 
estimate  the  goodness  of  fit  by  [17]  and  R^prs-  The  R^  statistic  measures  the  proportion 
of  the  variability  in  the  data  "explained"  by  the  regression,  while  R^pps  is  a  modification  that 
is  useful  for  detecting  possible  model  defects  (see  Appendix  B). 

The  quadratic  regression  model  for  gain  gives  a  bad  fit  with  R^  =  0.914,  but 
R  ^pRs  =  -0.02.  This  clear  indication  of  lack  of  fit  is  caused  by  1 1  points  with  unusually  low 
gains.  For  instance,  at  run  number  96  the  gain  =  24  «  1000.  We  examined  the  circuit 
simulation  outputs  at  these  anomalous  observations,  and  found  that  either  transistor  Ml  (2 
cases)  or  M7  (9  cases)  is  not  operating  in  the  saturation  region.  The  small-signal  gain 

8  ml  ~8  ml 

8dsl'^8tis2  8ds(&8dsl 


(4.6) 
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is  low  at  these  points,  due  to  the  large  drain-to-source  conductances  of  the  unsaturated  transis¬ 
tors  Ml  or  M7  Cg^i  or  g^^). 

To  cope  with  these  problems,  we  construct  two  new  models  for  the  terminal  voltages  of 
Ml  and  M7: 

>4  =  ^dsi  -  (^^ii  *  ^ti)  (>’4  >  0  for  Ml  to  be  saturated), 

Vs  =  ^dsi  -  (^gsi  -  ^ti)  CVs  >  0  for  M7  to  be  saturated). 

Since  SPICE  produces  values  for  etc.,  the  data  can  be  used  to  fit  >>4  and  without 
requiring  new  circuit  simulations.  The  equations  for  ^4  and  ys  fit  very  well,  with 
R^R^Pifg)  =  1.000(1.000),  and  0.996  (0.957);  they  will  be  used  in  the  yield  optimization  to 
constrain  the  operating  regions  of  transistors  Ml  and  M7. 

The  eleven  points  with  poor  gains  (>'4  <  0  or  ^5  <  0)  are  deleted  from  the  dataset.  The 
models  for  the  gain,  and  the  delay,  T2,  fit  very  well  to  the  remaining  89  observations, 
with  R^  i^^pRs)  ~  0.999  (0.982),  and  0.998  (0.969).  The  model  for  T j  does  not  fit  as  well, 
however,  with  R^  (R^prs)  =  0.976  (0.559).  We  could  try  to  improve  the  model  for  T j,  but  it 
turns  out  that  the  T 1  delays  are  small  enough  that  they  do  not  affect  the  yield  anyway. 

For  a  given  C,  the  parametric  yield  can  be  predicted  by  Monte  Carlo  simulation  using 

the  fitted  models,  y^  (C  U,  v).  We  generate  500  samples  U,-,  i  =  1,...,5(X)  from  the  distribu¬ 

tion  r(U).  The  predicted  yield  is 

«  1  500 

^  n  /(yi(C  +  Ui,v) . y5(  C -I- Uj,  V)).  (4.7) 

i=l  v=-1.0,l 

The  performance  constraints  must  be  met  for  all  three  levels  of  v ,  hence,  the  product  of  indi¬ 
cator  functions. 
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The  yield  is  optimized  with  respect  to  C  using  the  simplex  algorithm.  This  is  computa¬ 
tionally  inexpensive  because  we  are  using  the  fitted  models  and  do  not  have  to  run  the  simula¬ 
tor.  For  constraints  yi  >  1000,  >2,  y3  <  170  ns,  and  y^,  >  0,  there  are  many  designs  with 

very  high  yield.  This  suggests  that  the  performance  constraints  may  be  tightened.  Re¬ 
optimizing  with  the  new  constraints  on  the  two  time  delays,  y2»  ^3  ^  130  ns,  produces  the 
design  C*  with  Cg  =  0.99,  c-j  =  -0.96,  Cg  =  -1.00,  and  C9  =  0.77.  The  estimated  yield  is 
<i>(C*)  =  95%.  Further  tightening  of  this  constraint  leads  to  drastically  lower  yields.  How¬ 
ever,  as  the  design  C*  has  the  width  M7  at  its  lower  bound  (Cg  =  -1.00),  experimentation  with 
smaller  widths  of  M7  may  lead  to  further  improvement.  A  similar  comment  may  apply  to  Cg 
and  C7. 

The  regression  models  for  the  yield  optimization  may  not  approximate  the  performance 
variations  around  C*  very  accurately.  A  confirmation  experiment  provides  a  more-accurate 
estimate  of  the  yield.  The  confirmation  experiment  follows  [3].  The  assumed  model  of  the 

responses  is  a  quadratic  of  Ui . u^,  and  v.  Data  are  collected  according  to  the  48-run  Latin 

hypercube  design  varying  Ui,...,  and  v  only.  Models  for  yi,...,  yj  have  (R^prs)  = 
1.000  (0.944),  0.995  (0.605),  1.000  (0.918),  1.000  (1.000),  and  0.998  (0.928).  This  suggest 
the  models  of  yj,  y3,  y^,  and  y^  fit  very  well.  The  model  for  T j  does  not  fit  well,  but  again 
it  does  not  matter.  The  predicted  yield  <i>(C*)  obtained  firom  200  Monte  Carlo  samples  with 
the  fitted  models  is  93%.  Figure  4.3  is  a  contour  plot  of  the  predicted  parametric  yield  as  the 
constraints  on  the  gain  and  delays  vary. 

From  running  the  circuit  simulator  at  the  same  200  Monte  Carlo  samples  (this  involves 
600  circuit-simulator  runs,  because  there  are  three  levels  of  the  operating  condition  v),  the 
yield  at  C*  is  92%.  This  close  agreement  with  93%  shows  that  the  48-run  confirmation 
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experiment  would  have  been  sufficient.  Moreover,  the  contour  plot  in  Figure  4.4  of  the  yields 
from  the  SPIGE  runs  agrees  well  with  that  in  Figure  4.3. 

Example  B.  CMOS  Comparator  II 

A  second  CMOS  comparator  consisting  of  53  MOS  transistors  is  shown  in  Figure  4.5. 
The  circuit  topology  follows  from  the  techniques  described  in  [38].  The  performances  of 
interest  and  their  constraints  are 

=  dc  gain  (A^  >  30), 

0)3^  =  3  dB  bandwidth  >  60  MHz). 

In  this  example  there  are  12  parameters  of  interest.  The  first  five  are  the  uncontrollable 
statistical  variations  Wj,  .  .  .  ,u^  for  the  critical  MOSFET  parameters  already  described  in  Ex¬ 
ample  A.  The  distributions  of  the  device  parameters  arc  again  independent  Gaussian,  but  their 
[-3a,-^3a]  ranges  are  different.  The  remaining  seven  parameters  are  designable,  with  no  sta¬ 
tistical  variation: 

Cg  =  width  of  transistors  M  \A  and  M  IB 

O0\im  <W{MlA,MlB)<m\im), 
c-i  =  width  of  transistors  M2A  and  M2B 

(4  pm  ^  W(M2A,M2B)  <  12  pm), 

Cg  =  ratio  of  the  widths  of  M3A  and  M3B  to  M2A  and  M2B 
(1.1  ^  W(M3A,  M3B)IWiM2A  ,  M2B)  <1.9), 


51 


Cg  =  width  of  transistors  M4A  and  M4B 

(15  |im  <  W{M4A,M4B)  <70^m), 
c  10  =  width  of  transistors  M  5A  and  M  5B 

(15  m  <  W{M5A,M5B)<25  \un), 

Cii  =  width  of  transistors  M6A  and  M6B 

(4  ^un  ^  WiM6A,M6B)  <  12  m), 
c  12  =  ratio  of  the  widths  of  MIA  and  MIB  to  MSA  and  M SB 
(1.1  ^W(M7A,M7B)IW(MSA,M6B)<  1.9). 

All  the  parameters  are  normalized  to  the  range  [-1,+!].  Part  of  a  100- run  Latin  hypercube 
design  for  the  12  parameters  Ui,  ,  W5,  Cg,  .  .  .  ,  C12  is  listed  in  Table  4.2,  along  with  the 
data. 

Modeling  the  gain  by  a  quadratic  model  in  Ui,  .  .  .  ,  1/5,  Cg,  .  .  .  ,  C12  gives  a  fairly  poor 
predictor:  /?^  =  0.999  but  R^pus  ~  0.750.  To  improve  the  prediction  of  the  gain  we  note  that 
the  comparator  circuit  consists  of  two  stages.  If  A^i  and  ^^2  denote  the  gains  at  the  two 
stages,  then 

A,  =A,i*A,2.  (4.8) 

We  therefore  try  modeling  Avi  and  Ay  2.  from  which  A^  can  be  predicted. 

The  transistors  in  the  first  and  second  stages  are  connected  together  at  nodes  1  and  2, 
and  we  use  denote  the  common  dc  voltage  at  these  two  nodes,  which  is  a  parameter 

affecting  Av2-  We  assume  that  the  models  for 
yi  =  dc  gain  of  the  first  stage,  Avi, 
y2  =  dc  voltage  at  nodes  1  and  2, 


Table  4.2  Part  of  lOO-run  Latin  hypercube  design  for  or 2  and 
the  observed  y  (.....y 4  for  Example  B. 
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should  be  functions  of  .  .  .  ,^5  and  ...  ,Cq,  while 

)’3  =  dc  gain  of  the  second  stage,  A^,2. 
depends  on  .  .  .  ,M5,  C9,  .  .  .  ,Ci2,  and  V^. 

In  the  models  for  >>3  below,  >’2  treated  as  another  input  parameter,  and  we  write 
(jci,  .  .  .  ,a:i3)  for  («!,  .  .  .  ,M5,C6,  .  .  .  ,Ci2,y2)-  Th®  assumed  quadratic  equations  for  Avi 
Vc  are 

9  99 

yk  =  Po*  +  ^  =  1,2.  (4.9) 

,=1  i=l;s:i 

Similarly,  the  assumed  model  for  /4„2  is  a  quadratic  in  (jti,....,  x^,  jtio,  j^u,  X12,  X12).  The 
regression  models  of  y^,  >3  have  (R^prs)  values  of  0.999  (0.995),  1.000  (0.999),  and 

0.995  (0.945),  respectively,  suggesting  very  good  predictive  capabilities  for  A^i,  Vc,  and  A^2- 
We  note  that  is  strongly  affected  by  and  Cg,  while  A^2  depends  strongly  on  c  12- 

Modeling 

y4  =  the  3  dB  bandwidth,  (03^ , 

by  a  quadratic  model  in  Mi,  .  .  .  ,  u^,  c^,  ...  ,  C12  gives  a  fairly  good  predictor:  =  0.999 

and  R^prs  =  0.851.  Attempting  to  improve  the  predictor  by  decomposing  the  circuit  into  two 
stages  is  not  successful  for  bandwidth.  An  approximation  relating  the  overall  bandwidth  to 
the  bandwidths  of  the  two  stages,  ignoring  higher-order  poles,  leads  to  substantial  error  here. 
Thus,  we  retain  the  quadratic  model  in  Mi,  .  .  . ,  U5,  c^,  .  .  .  ,  c  12  for  the  bandwidth. 

Monte  Carlo  estimates  of  yield  for  a  circuit  defined  by  C  are  based  on  500  samples,  U, , 
i  =  1,...,500.  The  fitted  models  are  used  to  make  predictions  of  the  bandwidth,  the  gain  of 
stage  1,  and  the  voltage  V^.  Then,  the  predicted  voltages  are  fed  into  the  model  y3  to  predict 
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the  gain  of  stage  2.  The  predicted  gam  is  computed  from  Eq.  (4.8).  The  estimated  yield 
4>(C)  is  the  percentage  of  samples  with  >  30,  and  0)3^  >  60  MHz. 

The  yield  is  optimized  with  respect  to  C  using  the  simplex  algorithm.  A  design  C*  with 
^6  =  0.60,  C7  =  -1.00,  Cg  =  0.10,  C9  =  -1.00,  Cjo  =  0.85,  =  -0.98,  C]2  =  -0.58,  and 

<i>(C*)  =  100%  is  found. 


For  the  confirmation  experiment,  we  use  a  36-run  Latin  hypercube  design  varying  only 


the  uncontrollable  parameters  The  models  for  y, . have  (R^prs)  =  1-000 

(1.000),  1.000  (1.000),  1.000  (0.998),  and  0.945  (0.649),  respectively.  This  suggests  that  the 
models  (yj,  y2>  >3)  ^v2>  very  well.  The  predictor  of  (03^  is  less 

accurate.  Nonetheless,  the  yield  of  99.0%  estimated  by  the  four  models  from  500  Monte  Car¬ 
lo  samples  agrees  well  with  a  predicted  yield  of  99.2%  from  running  the  circuit  simulator  500 
times  from  the  same  Monte  Carlo  samples.  Again,  the  small  confirmation  experiment  is  ade¬ 
quate.  Figure  4.6  is  a  contour  plot  of  the  yield  (from  the  5(X)  circuit  simulator  runs)  as  the 
constraints  on  the  gain  and  bandwidth  vary.  It  indicates  that  there  is  a  large  region  where  the 
yield  is  close  to  100%.  This  explains  why  our  predictor  of  bandwidth  is  adequate  here.  The 
plot  also  indicates  that  the  bandwidth  is  not  a  smooth  function  of  the  inputs:  there  is  a  rapid 
decrease  in  yield  in  certain  parts  of  the  region.  This  explains  the  difficulty  in  modeling  the 
bandwidth  by  a  quadratic  function.  The  more-flexible  models  suggested  in  [35]  may  lead  to  a 
better  predictor. 


4.3.  Concluding  Remarks 

In  this  chapter  we  presented  a  new  method  for  parametric  yield  optimization  of  MOS 
VLSI  circuits.  By  constructing  computationally  cheap  models  for  the  circuit  performances, 
we  achieve  high  yields  with  relatively  few  simulator  runs.  Based  on  our  experience  on  this 
and  other  examples: 

1.  Engineering  knowledge  should  be  used  to  select  the  most  basic  performances,  as  illustrated 
by  the  modeling  of  ^4^^  and  Ay,^  in  the  second  example.  These  are  more  likely  to  admit 
simple  polynomial  approximations  of  high  accuracy,  from  which  the  performances  of 
interest  can  be  constructed. 

2.  If  there  are  ill-behaved  observations,  such  as  catastrophic  failures,  and  an  assumed  model 
does  not  fit  well,  the  designer  should  identify  the  causes  of  the  failures.  As  we  demon¬ 
strated  in  the  first  example,  additional  design  constraints  can  be  included  to  keep  the  yield 
optimization  away  from  undesirable  regions  of  the  designable  parameters. 

3.  Points  1  and  2  suggest  that  statistical  modeling  should  incorporate  engineering  insights. 

4.  With  careful  modeling,  accurate  approximations  to  crucial  performances  can  be  developed 
with  relatively  few  circuit  simulations.  Effective  yield  optimization  is  possible  with  these 
computationally  inexpensive  models. 
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CHAPTER  5. 


PARAMETER  DESIGN  METHOD  FOR  OFF-LINE  QUALITY  CONTROL 

Taguchi’s  off  line  quality  control  method  [9]  for  product  and  process  improvements  has 
gained  much  interest  recently  in  engineering.  Instead  of  maximizing  the  parametric  yield,  the 
Taguchi  approach  [9]  employs  the  design  and  analysis  of  experiments  to  design  quality  "into" 
products,  such  that  they  are  insensitive  to  the  manufacturing  process  and  environmental  varia¬ 
tions.  Taguchi’s  approach  often  requires  a  prohibitively  large  number  of  experimental  runs, 
however.  The  Taguchi  method  collapses  data  from  many  (circuit  simulator)  runs  into  his  so 
called  "signal-to-noise  ratio,"  just  as  the  yield  gradient  method  collapses  data  when  computing 
the  yield  gradient 

In  this  chapter  we  extend  the  yield  optimization  method  in  Chapter  4  to  achieve  off-line 
quality  control.  Again,  the  proposed  method  models  the  circuit  performance  as  a  function  of 
all  parameters  of  interest.  For  a  given  set  of  designable  parameters  the  fitted  models  predict 
the  circuit  performances  as  the  uncontrollable  parameters  vary.  This  leads  to  a  prediction  of 
the  Taguchi  loss  statistic,  instead  of  parametric  yield.  The  loss  statistic  is  then  numerically 
minimized  with  respect  to  the  designable  parameters.  We  give  a  circuit  example  where  the 
Taguchi  objectives  are  met  with  about  two-thirds  fewer  runs  than  [9]. 

The  Taguchi  method  is  described  in  Section  5.1.  The  proposed  method  is  detailed  in 
Section  5.2.  Section  5.3  compares  the  Taguchi  and  the  proposed  approach  in  a  CMOS  clock 
skew  minimization  example.  Discussion  is  given  in  Section  5.4. 
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5.1.  Taguchi’s  Parameter  Design  Method 


Taguchi’s  off-line  quality  control  methods  [9]  emphasize  designing  quality  into  products 
(circuits),  so  that  they  are  less  sensitive  to  sources  of  variability.  Parameter  design,  an  impor¬ 
tant  step  in  off-line  quality  control,  is  the  search  for  levels  of  designable  (controllable)  param¬ 
eters  that  lead  to  a  product  robust  to  the  variability  in  the  manufacturing  process  and  environ¬ 
mental  conditions  (noise).  A  key  feature  in  [9]  is  the  separation  of  the  designable  parameters 
and  uncontrollable  statistical  variation  for  the  design  of  experiment.  Kackar  [39],  and 
Hunter  [40]  gave  very  readable  accounts  of  the  main  ideas;  Kackar  and  Shoemaker  [41],  and 
Phadke  [42]  provided  some  examples. 

We  use  y  to  denote  the  circuit  performance  of  interest,  and  C  and  U  to  denote  the  con¬ 
trollable  and  uncontrollable  variations,  respectively.  The  distribution  of  U  is  denoted  by  r(U). 
Taguchi  measures  the  quality  of  a  circuit  C  by  a  a  loss  statistic  [9]: 

^(C)  =J  (}'(C,U)-y,^^,,)2  r(U)dU,  (5.1) 

which  is  the  expected  squared  deviation  of  the  circuit  performance  from  its  target  value, 
y target-  The  objective  of  parameter  design  is  to  find  C  that  minimizes  L(C). 

Taguchi  optimizes  L(C)  with  respect  to  C  in  a  two-step  procedure  [40].  First,  signal- 
to-noise  ratios  are  computed  from  the  data,  which  are  then  used  to  optimize  the  loss  statistic. 
The  connection  between  the  signal-to-noise  ratio  and  the  loss  statistic  is  given  in  [43]. 

Taguchi’s  design  strategy  for  off-line  quality  control  involves  six  steps.  The  example 
will  illustrate  details  of  its  implementation. 
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STEP  1:  Assume  a  model  of  the  loss  statistic 

L(C)  =  ^(C,  P)  +  err^7/-.  (5.2) 

The  model  g  is  an  assumed-known  function  of  C  (for  example,  a  polynomial 
model),  and  P  is  a  vector  of  urJcnown  constants  to  be  estimated  from  the  data. 
The  error  term  represents  systematic  departure  from  the  assumed  model.  We 
prefer  to  minimize  L  (C),  because  it  underlies  the  Taguchi  philosophy,  rather  than 
maximize  a  signal-to-noise  ratio.  The  disadvantages  of  the  signal-to-noise  ratio 
are  discussed  in  [44]. 

STEP  2;  Design  a  control  array  experiment  for  C  and  a  noise  array  experiment  for  U.  For 
every  C  in  the  control  array,  simulate  the  circuits  at  every  U  in  the  noise  array. 

STEP  3:  Compute  estimates  of  the  loss  statistic  L(C)  for  every  point  in  the  control  array. 
Fit  the  data  to  model  (5.2). 

STEP  4:  Minimize  E(C)  with  respect  to  C,  subject  to  design  constraints. 

STEP  5:  Conduct  a  confirmatory  experiment  to  evaluate  L{C)  by  fixing  C  at  the  value(s) 
found  in  STEP  4  and  varying  U  according  to  HU). 

A  major  problem  of  Taguchi ’s  approach  is  the  large  number  of  experimental  runs  due  to 
the  crossing  of  the  control  and  noise  arrays.  Moreover,  it  is  often  difficult  to  identify  an 
appropriate  model  for  the  loss  function.  In  the  following  section,  we  will  discuss  how  the 
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yield  optimization  method  in  Chapter  5  is  adapted  to  achieve  off-line  quality  control,  but  with 
far  fewer  runs  than  the  Taguchi  experiment. 

5.2.  Proposed  Off-line  Quality  Control  Method 
The  proposed  approach  involves  five  steps: 

STEP  1:  Design  a  single  experiment  to  predict  the  performance  as  a  function  of  the  con¬ 
trollable  parameters  C  and  the  uncontrollable  statistical  variables  U; 

y  (C,  U)  =  /  (C,  U,  P)  +  error.  (5.3) 

The  model  /  is  an  assumed-known  function  of  both  C  and  U,  and  p  is  a  vector  of 
unknown  constants.  We  do  not  use  separate  control  and  noise  arrays;  considerable 
economy  in  the  number  of  observations  can  result  from  designing  a  single  experi¬ 
ment  for  both  factors. 

STEP  2:  Simulate  the  circuits  at  the  design  points,  and  fit  the  performance  model  y(C,U). 

For  a  given  C  predict  the  loss  statistic  L(C)  from  the  estimated  response  function. 
For  example,  the  predicted  loss  statistic  (5.2)  is 

E(C)  =  |y^(C,U)r(U)dU.  (5.4) 

In  the  example  below,  the  density  r(U)  is  taken  as  combinations  of  the  best-  and 
worst-case  current  files  of  the  transistors.  This  is  mainly  for  convenience. 

STEP  3:  Minimize  L(C)  as  a  function  of  C.  In  practice,  the  mathematical  optimization 
will  be  tempered  by  engineering  and  cost  considerations. 
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STEP  4:  Conduct  a  confirmatory  experiment  to  evaluate  L(C)  by  fixing  C  at  the  value(s) 
found  in  STEP  3  and  varying  U  according  to  FfU). 

STEP  5;  Iterate  if  necessary.  For  instance,  if  the  optimization  step  suggests  values  of  C 
outside  the  design  region  which  are  technically  and  economically  feasible,  then  the 
design  region  could  be  shifted. 

5.3.  Example:  Minimization  of  Process  Dependent  Clock  Skew 

This  example  considers  a  CMOS  clock  driver  [45]  shown  in  Figure  5.1.  From  the  mas¬ 
ter  clock  CLKj^^ ,  the  circuit  generates  outputs  CLK  and  CLK .  The  clock  skew  is  defined  to 
be  the  difference  in  the  output  delay  times  of  the  CLK  and  CLK  signals.  Because  each  clock 
signal  switches  twice  per  machine  cycle,  two  clock  skews  and  Sp  can  be  measured  (in 
units  of  nanoseconds),  as  illustrated  in  Figure  5.2. 

The  design  objective  is  to  determine  the  channel  widths  {wi,  .  .  .  ,w^}  for  the  transistors 
in  Figure  5.1  that  give  the  smallest  clock  skew  in  the  presence  of  device  parame¬ 
ter  variability.  To  allow  for  quadratic  effects,  the  experiment  is  carried  out  with  each  width  at 
three  levels,  denoted  by  -I,  0,  1. 

Shoji  [45]  investigated  the  clock  skew  example  and  proposed  an  empirical  method  for 
minimizing  the  skew.  In  [45],  the  uncontrollable  statistical  variabilities  are  represented  by  the 
high  (H),  medium  (M),  and  low  (L)  current  files  of  the  P-  and  N-channel  MOSFET  transis¬ 
tors.  These  device  parameter  combinations  are  coded  1,...,5  below.  For  the  purpose  of  com¬ 
parison,  we  will  use  these  device  parameter  combinations  in  our  experiment. 
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We  will  consider  loss  statistics  that  combine  two  skews.  For  fixed  w  =  (wi,  .  .  .  .wg), 
there  are  five  pairs  of  skews  corresponding  to  device  parameters  {PH-NH,  PH-NL,  PM-NM, 
PL-NH,  PL-NL).  Denote  these  skews  by  Si(w),„.,^jo(w).  The  target  skew  is  zero,  and  the 
logarithm  of  the  average  squared-error  loss  is 

L,^(w)  =log  [  ^  £  Siiw)].  (5.5) 

The  logarithmic  transformation,  being  monotonic,  does  not  affect  the  optimal  w  but  is 
relevant  in  Section  5.3.2  when  is  modeled  directly.  A  more-conservative  performance 
measure  is  the  worst-case  skew 

=  max  [  ISi(w)l,  ....  ISioCw)!  ].  (5.6) 

5.3.1.  Modeling  the  circuit  performances 

Several  considerations  determined  the  choice  of  model  for  the  clock  skews  as  functions 
of  the  controllable  parameters  and  the  uncontrollable  statistical  variation. 

•  Because  cui'vature  and  interactions  cannot  be  ruled  out  we  adopt  a  second-order  polyno¬ 
mial  model. 

•  There  are  only  five  combinations  of  the  uncontrollable  statistical  variations  {PH-NH,..., 
PL-NL).  For  simplicity,  therefore,  we  treat  these  combinations  as  a  single  qualitative 
factor  ^  at  five  levels  1,...,5. 
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•  There  are  two  chains  of  transistors  in  Figure  5.1:  (M1,...,M4)  and  (M5,M6).  This  sug¬ 
gests  that  only  seven  interactions  wjW2,  wjW3,  W1W4,  W2W3,  W2W4,  W3W4,  and  VV5W5 
involving  transistors  in  the  same  chain  need  be  considered. 

•  No  similar  reasoning  leads  to  a  reduction  in  the  number  of  interactions  between  the 
widths  and  the  device  factor,  however.  Although  one  might  suspect,  for  example,  that 
there  would  be  no  interaction  between  the  widths  of  the  P-channel  transistors  and  the 
component  of  the  device  factor  representing  variability  in  current-driving  capabilities  of 
the  N-channel  transistors,  this  is  not  the  case.  The  two  types  of  transistors  are  intercon¬ 
nected.  The  interconnections  between  the  transistors  allow  the  effects  of  the  device 
parameters  to  differ  at  various  transistor  sizes.  Easterling  [46]  pointed  out  this  connec¬ 
tion  between  interactions  and  robustness  to  sources  of  variability. 

Thus,  we  model  the  two  skews  as  a  function  of  C  and  C  =  j  by 

T (w,;)  =  Po  +  PlWi  +  ....  -t-  PfiWg  4  PiiWi^  4-  •  •  •  4-P66)V6^ 

+  P12W1W2  4-  P13W1W3  4-  P14W1W4  -t-  p23H'2W3  4-  P24W2W4 

+  P34W3W4  4-  P56W5W6  4-  Yy  4-  4-  •  •  •  Sgy  Wg  (5.7) 

The  unknown  constants  Yi>— ^tnd  5ii,...,5g5  are  the  main  effects  for  the  qualitative  device 
factor  and  the  interaction  effects  between  the  designable  and  device  factors.  Because  the 
observations  are  derived  from  a  deterministic  circuit  simulator,  there  is  no  random  error,  and  e 
represents  systematic  departure  from  the  assumed  linear  model.  Because  not  all  the  unknown 
constants  are  identifiable,  we  arbitrarily  set  =  625  =  ■  •  ■  =  855  =  0.  This  leaves  48  unk¬ 
nown  constants  to  be  estimated. 
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We  designed  a  60-observation  experiment  to  estimate  the  48  unknown  constants  in 
model  (5.7).  The  number  of  runs  is  fairly  arbitrary,  clearly,  at  least  48  are  needed,  and  we 
wanted  a  modest  number  of  degrees  of  freedom  to  assess  potential  lack  of  fit.  Again,  the  aver¬ 
age  mean-squared  error  criterion  in  the  AGED  package  [28]  was  used  to  obtain  the  design 
(see  Section  A.l).  (This  criterion  also  includes  the  variance  arising  from  random  error;  we 
weighted  the  bias  component  to  be  dominant  because  there  is  no  random  error)  The  use  of  a 
computer  package  such  as  AGED  also  circumvents  the  difficulties  in  designing  this  experi¬ 
ment:  only  some  of  the  interactions  need  to  be  estimated  and  the  five-level  device  factor 
space  is  not  a  regular  factorial  arrangement.  The  experimental  design  and  the  resulting  data 
are  given  in  Table  5.1. 

The  two  skews  are  separately  modeled  via  Eq.  (5.7).  Least  squares  estimation  of  the 
unknown  constants  allows  us  to  predict  the  two  skews  at  untried  levels  of  the  designable  and 
device  factors.  In  the  presence  of  systematic  error  rather  than  random  error,  statistical  testing 
is  inappropriate.  Nonetheless,  the  root  mean  squared  errors  of  the  least  squares  analyses  for 
the  two  skews  are  0.03  and  0.08  (relative  to  data  ranges  of  about  -3.9  to  0.2  and  -2.2  to  3.8). 
The  values  are  1.000  and  0.999,  suggesting  that  the  models  fit  well.  We  note  that 

•  The  first-order  effects  for  both  the  designable  and  device  factors  are  all  large. 

•  Many  of  the  second-order  (quadratic  and  interaction)  effects  are  moderately  large. 

•  The  contrast  between  high  and  low  levels  of  the  N-channel  device  factor  is  larger  than 
that  for  P.  It  is  often  found  that  the  N-channel  variability  is  more  critical. 
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Table  5. 1  Experimental  design  and  data  for  modeling  clock  skews. 
(Two  skews  are  observed  for  each  run.) 


Noise 

Run  Transistor  Width  w  Level  Skews 


1 

-1 

-1 

-1 

-1 

-1 

-1 

3 

-1.289 

-0.307 

2 

-1 

-1 

-1 

-1 

-1 

0 

4 

-0.636 

-1.199 

3 

-1 

-1 

-1 

-1 

1 

-1 

1 

-1.219 

0.907 

4 

-1 

-1 

-1 

1 

1 

-1 

2 

-1.151 

1.678 

5 

-1 

-1 

-1 

1 

1 

1 

3 

-0.449 

-0.422 

6 

-1 

-1 

-1 

1 

1 

1 

5 

-0.510 

-0.343 

7 

-1 

-1 

0 

-1 

-1 

-1 

5 

-2.758 

0.157 

8 

-1 

-1 

1 

-1 

-1 

1 

2 

-2.414 

-1.309 

9 

-1 

-1 

1 

0 

-1 

1 

5 

-1.920 

-1.633 

10 

-1 

-1 

1 

1 

-1 

1 

4 

-0.809 

-1.546 

11 

-1 

-1 

1 

1 

0 

0 

1 

-1.227 

-0.496 

12 

-1 

0 

-1 

-1 

0 

1 

2 

-1.412 

0.041 

13 

-1 

0 

•1 

1 

0 

1 

4 

-0.452 

-0.628 

14 

-1 

0 

0 

0 

-1 

-1 

1 

-1.127 

0.062 

15 

-1 

0 

1 

-1 

1 

0 

5 

-3.860 

2.011 

16 

-1 

0 

1 

0 

0 

-1 

4 

-2.107 

0.862 

17 

-1 

1 

-1 

-1 

-1 

-1 

2 

-2.300 

1.350 

18 

-1 

1 

-1 

-1 

-1 

1 

3 

-1.118 

-0.466 

19 

.1 

1 

-1 

0 

-1 

0 

5 

-1.495 

0.070 

20 

-1 

1 

-1 

1 

0 

1 

1 

-0.512 

-0.236 

21 

-1 

1 

-1 

1 

1 

-1 

4 

-1.184 

1.592 

22 

-1 

1 

1 

-1 

-1 

-1 

1 

-2.126 

0.479 

23 

-1 

1 

1 

-1 

1 

1 

4 

-2.504 

0.931 

24 

-1 

1 

1 

0 

1 

-1 

3 

-2.769 

2.567 

25 

-1 

1 

1 

1 

1 

-1 

5 

-3.315 

3.759 

26 

-1 

1 

1 

1 

1 

1 

2 

-1.9L2 

1.149 

27 

0 

-1 

-1 

-1 

0 

-1 

5 

-1.927 

0.365 

28 

0 

-1 

-1 

0 

1 

1 

4 

-0.452 

-0.922 

29 

0 

-1 

0 

1 

-1 

0 

5 

-0.855 

-2.175 

30 

0 

-1 

1 

-1 

0 

0 

4 

-1.768 

-0.748 

(continued) 
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For  fixed  w ,  the  two  models  are  used  to  predict  five  pairs  of  skews  corresponding  to  the 
five  device  factor  levels.  Denote  these  10  predictions  by  5i(w),  .  .  .  ,Sio(w)-  The  loss  statis¬ 


tics  in  Eqs.  (5.5)  and  (5.6)  can  then  be  estimated  by 


^sq(yv)  =  log 


1 


and 


^wc(^)  =  is  i(w ),....,  ISio(vv')l  ]. 


(5.8) 


(5.9) 


The  next  step  is  to  minimize  either  of  these  loss  statistics  with  respect  to  w .  Discussion 
of  this  and  the  validation  from  confirmatory  experiments  will  be  presented  in  Section  5.3.3, 
along  with  a  comparison  of  alternative  design  and  modeling  strategies. 


5.3.2.  Modeling  the  loss  statistics  directly 

For  comparison,  we  also  conducted  an  experiment  with  separate  control  and  noise  arrays, 
as  has  been  advocated  by  Taguchi  for  optimizing  through  direct  modeling  of  a  performance 
measure. 

The  choice  of  a  model  for  a  loss  statistic  L  is  problematic.  Whereas  the  engineer  may 
have  substantial  background  knowledge  concerning  the  underlying  circuit  performance, 
approximate  models  for  complex  loss  statistics  are  typically  not  so  intuitive.  In  this  example, 
when  modeling  the  skews,  there  is  an  engineering  basis  for  omitting  some  designable-factor 
interactions.  However,  this  need  not  imply  that  the  same  interactions  are  negligible  when 
modeling  the  loss  statistics,  which  arc  nonlinear  functions  of  the  skews.  Indeed,  these  interac¬ 
tions  turn  out  to  have  fairly  large  effects.  As  a  simple  illustration,  Y  =  +  ^2  has  no 
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interaction  between  two  factors  x  i  and  X2,  but  the  loss  Y^,  for  example,  clearly  does.  A  log 
transformation  is  often  suggested  to  reduce  interaction  effects  in  signal-to-noise  ratios  similar 
to  the  squared  error  loss  in  Eq.  (5.8)  (see  for  example  [39]).  In  the  absence  of  engineering 
intuition,  however,  we  adopt  a  full  second-order  model  in  w  for  both  loss  statistics  (though 
hesitantly  for  the  non- smooth  1^^.): 

6  6  6 

L(w)  =  Po  +  ZPi^i  +  L L  (5.10) 

i=i  i=ij=i 

There  are  28  unknown  constants,  and  we  designed  a  40-run  (conffol  array)  experiment 
for  the  designable  factors,  again  using  AGED.  As  when  modeling  the  clock  skews,  the  size 
of  the  control  array  is  somewhat  arbitrary.  A  minimum  of  28  runs  is  needed,  and  we  allow 
12  extra  degrees  of  freedom  to  measure  the  lack  of  fit.  Crossing  the  control  array  with  the 
noise  array  of  size  five  leads  to  a  total  of  200  runs.  Part  of  the  experimental  design  and  data 
are  given  in  Table  5.2.  For  a  given  w  in  the  control  array,  the  data  generated  across  the 
noise  array  collapsed  to  the  loss  statistics  and  L^f.{w)  in  Eqs.  (5.8)  and  (5.9).  Fitting 

model  (5.10)  to  these  observed  statistics  provides  direct  predictions  to  and 

untried  w’s,  which  can  be  optimized  with  respect  to  w.  The  root  mean  squared  errors  for 
the  and  fits  are  0.1  and  0.25  relative  to  data  ranges  of  -0.5  to  0.8  and  1.0  to  3.9.  Fit¬ 
ting  by  least  squares  gives  values  of  0.824  for  in  Eq.  (5.8)  and  0.907  for  in  Eq. 
(5.9).  Qualitatively,  then,  the  models  do  not  fit  quite  as  well  as  those  for  the  skews. 

As  noted  in  Section  5.1,  this  experiment  does  not  strictly  follow  the  pattern  of  analysis  in 
the  examples  given  by  Taguchi  [9].  That  paradigm  fits  additive  models  to  the  loss  statistics 
(ignoring  interactions)  and  would  optimize  the  level  of  each  transistor  width  separately.  In 
this  example,  using  the  data  from  a  (40  x  5)-run  experiment,  such  an  approach  leads  to 
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Table  5.2 

Runs 

1-5 

6-10 

11-15 

16-20 

21-25 


Part  of  the  crossed  array  experimental  design  and  data  for  modeling  loss 
statistics  directly.  (Two  skews  are  observed  for  each  run.) 


Skew  at  Noise  Level 


Transistor  Widths 

w 

1 

2 

3 

4 

5 

1 

-1 

-1 

-1 

0 

1  -0.73 

-1.12 

-1.01 

-0.83 

-1.30 

-0.76 

-0.78 

-0.76 

-0.86 

-0.78 

1 

-1 

-1 

1 

1  - 

1  -OJA- 

-1.15 

-1.01 

-0.88 

-1.43 

0.60 

1.68 

1.16 

0.72 

1.97 

1 

-1 

0 

0 

■1  - 

1  -0.98 

-1.85 

-1.44 

-1.03 

-2.06 

-0.46 

0.01 

-0.34 

-0.57 

-0.13 

•1 

-1 

1 

-1 

0  - 

1  -2.11 

-3.45 

-2.85 

-2.39 

-3.91 

0.39 

1.22 

0.79 

0.47 

1.43 

■1 

-1 

1 

1 

1 

1  -1.15 

-1.44 

-1.46 

-1.39 

1.79 

-0.29 

-0.29 

-0.19 

-0.22 

-0.09 
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extremely  poor  predictions  (given  below).  Also,  Taguchi’s  Ljg  experimental  design  could  be 
used  for  the  designable  factors.  The  comparison  we  make  is  more  consistent  with  our 
methods. 

5.3.3.  Results  and  comparisons 

We  now  give  results  for  the  following  strategies; 

(I)  the  60-run  experiment  for  the  skew  models  (5.7),  from  which  loss  statistics  are 
predicted  indirectly,  and 

(II)  the  (40  X  5)-run,  crossed  array  experiment  for  modeling  the  loss  statistics  directly  via 
Eq.  (5.10). 

We  also  consider  a  hybrid  strategy: 

(III)  the  (40  X  5)  experiment  from  strategy  II  but  modeling  the  skews  to  predict  the  loss 
statistics  as  in  strategy  I. 

Table  5.3  gives  results  for  the  loss  statistics  Listed  there  are  the  best-three  sets  of 
transistor  dimensions  w  over  the  3^  grid  {-1,0,1}^  as  predicted  by  each  of  these  three 
strategies.  The  last  column  gives  the  true  from  confirmatory  experiments.  Similarly, 

Table  5.4  presents  the  results  for  the  loss  statistic  Key  features  of  these  results  are: 

•  The  confirmatory  experiments  indicate  strategy  I  predicts  the  skews  accurately  enough 
to  give  predictions  very  close  to  the  actual  losses. 
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Table  5.3  Predicted  best  three  circuit  designs  for  the  squared-error  loss  statistic. 


Experiment 

Modeling 

Best  w  on  3°  Cube 

Predicted 

Actual 

60  runs 

skews  via  (5.7) 

0 

0 

-1 

1 

1 

1 

-0.90 

-0.90 

1 

1 

-1 

1 

1 

1 

-0.76 

-0.77 

-1 

-1 

-1 

1 

1 

1 

-0.68 

-0.72 

40  X  5  runs 

Lj,  via  (5.10) 

-1 

1 

-1 

1 

0 

1 

-0.38 

-0.50 

-1 

0 

-1 

1 

0 

1 

-0.31 

-0.61 

-1 

1 

-1 

1 

-1 

1 

-0.30 

-0.41 

40  X  5  runs 

skews  via  (5.7) 

0 

0 

-1 

1 

1 

1 

-0.89 

-0.90 

1 

1 

-1 

1 

1 

1 

-0.76 

-0.77 

-1 

-1 

-1 

1 

1 

1 

-0.69 

-0.72 

Table  5.4  Predicted  best  three  circuit  designs  for  the  worst-case  loss  statistic. 


Experiment 

Modeling 

Best 

W  ( 

on 

3^  Cube 

Predicted 

Actual 

60  runs 

skews  via  (5.7) 

0 

0 

1 

1 

1 

1 

0.50 

0.53 

-1 

-1 

- 

1 

1 

1 

1 

0.56 

0.51 

1 

1 

- 

1 

1 

1 

1 

0.63 

0.66 

40  X  5  runs 

via  (5.10) 

-1 

0 

1 

0 

0 

1 

0.93 

1.07 

-1 

0 

- 

1 

0 

1 

1 

1.05 

1.17 

-1 

1 

- 

1 

0 

0 

1 

1.06 

1.36 

40  X  5  runs 

skews  via  (5.7) 

0 

0 

1 

1 

1 

1 

0.50 

0.53 

-1 

-1 

- 

1 

1 

1 

1 

0.52 

0.51 

1 

1 

- 

1 

1 

1 

1 

0.63 

0.66 
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•  Strategy  I  gives  more  reliable  predictions  and  superior  circuit  designs  (small  actual 
losses)  than  strategy  H.  These  differences  are  of  importance;  for  example,  a  reduction 
of  the  worst-case  skews  from  1.07  ns  to  0.53  ns  is  of  practical  significance. 

•  Strategies  I  and  III  give  virtually  identical  results.  Tnis  shows  that  a  well-chosen, 
small  experiment  can  be  very  adequate. 

•  Comparing  the  results  for  all  three  strategies  indicates  the  superiority  of  modeling  the 
skews  rather  than  modeling  the  loss  statistics  directly. 

•  Fitting  additive  models  (ignoring  interactions)  to  the  loss  statistics  from  a  (40  x  5)  run, 
crossed-array  experiment  and  optimizing  the  level  for  each  transistor  width  separately 
produces  even  worse  predictions.  For  example,  the  predicted  best  squared  error  is 
-2.39,  but  the  actual  loss  computed  from  a  confirmation  experiment  is  -0.58. 

•  Strategy  I  gives  the  same  best-three  circuit  designs  for  the  two  loss  statistics.  This  is 
some  indication  that  even  better  performance  might  be  obtained  by  another  experiment 
by  changing  the  ranges  of  the  last  four  transistors  widths.  This  is  borne  out  by  the 
results  for  minimizing  the  predicted  loss  statistics  over  the  continuous  region  [-1,1]^ 
rather  than  the  discrete  region  {  1,0,1}^.  Such  optimization  of  the  predictor  from 
strategy  I  leads  to  w  =  {-0.07,  0.10,  -1.00,  1.00,  1.00,  1.00},  so  that  the  last  four 
optimal  widths  are  on  the  boundary.  The  implication  is  going  beyond  the  boundary 
could  lead  to  further  improvement. 
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5.4.  Discussion 

One  reason  why  our  proposed  method  gives  reliable  predictions  with  few  observations 
here  is  that  the  skews  admit  simple  models.  Moreover,  engineering  understanding  of  the 
underlying  circuit  performance  facilitates  model  identification.  Another,  more  general  advan¬ 
tage  of  modeling  the  underlying  performance  rather  than  a  loss  statistic  is  that  collapsing  the 
data  to  a  loss  statistic  could  hide  important  relationships  in  the  data. 

Clearly,  more  empirical  experience  is  required  to  determine  the  general  usefulness  of  the 
proposed  strategy.  In  another  application,  a  sense-amplifier  circuit  [47],  the  proposed  method 
with  60  observations  gives  more  accurate  predictions  of  the  losses  than  modeling  the  loss 
statistics  with  a  crossed  array  experiment  of  200  runs.  Again,  about  two-thirds  of  the  obser¬ 
vations  are  saved.  In  this  example  there  is  little  difference  in  the  actual  performances  of  the 
optimized  circuit  designs. 

We  used  a  computer-aided  statistical  design  package  to  automatically  generate  the  exper¬ 
imental  designs.  This  kind  of  tool  avoids  many  of  the  complications  often  experienced  when 
using  catalogued  experimental  designs.  For  example,  the  user  can  concentrate  on  the  model 
without  worrying  about  matching  the  desired  interactions  to  the  aliasing  structure.  We  believe 
that  the  widespread  adoption  of  these  tools  and  increased  attention  to  modeling  rather  than 
combinatorics  would  encourage  the  experimentation  needed  to  improve  quality.  Similarly, 
model  fining  and  the  minimization  of  predictions  over  a  grid  are  straightforward  using,  for 
example,  SAS  [29]. 


These  methods  can  be  extended  to  physical  experiments  with  random  error.  In  such 
experiments  noise  (process)  variability  is  due  to  unmodeled  sources  (measurement  error,  omit- 


ted  variables,  etc.)  as  well  as  the  manipulation  of  the  noise  variables.  If  the  unmodeled 
sources  are  unimportant  or  lead  to  a  noise  component  with  constant  variance,  there  is  little 
technical  difficulty  in  extending  our  methods.  Nonconstant  variance  requires  further  study, 
however. 
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CHAPTER  6. 

CONCLUSIONS 


In  this  dissertation  we  have  presented  a  new  circuit  performance  modeling  approach  to 
statistical  circuit  design.  It  is  demonstrated  that  fitted  models  of  the  circuit  performances,  ob¬ 
tained  from  a  statistically  designed  experiment  incurring  a  small  number  of  circuit  simulator 
runs,  can  be  used  as  accurate  and  computationally  inexpensive  substitutes  for  the  circuit  simu¬ 
lator  for  statistical  design  and  analysis.  This  modeling  approach  has  been  applied  to 
parametric  yield  prediction  [2,3],  yield  optimization  [7,8],  and  achievement  of  off-line  quali¬ 
ty  control  [47,48].  These  algorithms  have  been  implemented  into  the  iEDISON  design  pack¬ 
age  [37]. 

In  Chapter  2  we  reviewed  several  methods  to  represent  the  distribution  of  the  MOSFET 
device  parameters.  It  is  concluded  that  the  conventional  best-  and  worst-case  analyses  ap¬ 
proach  to  estimate  the  range  of  the  device  parameter  variations  is  inadequate.  An  alternative 
method  that  uses  four  critical  MOSFET  parameters  to  represent  the  device  parameter  distribu¬ 
tion  is  presented.  In  this  method,  the  other  device  parameters  are  treated  as  functions  of  the 
critical  parameters.  The  critical  parameter  approach  is  used  in  our  yield  prediction  and  optim¬ 
ization  examples  in  Chapters  3  and  4. 

In  Chapter  3  we  introduced  a  new  method  for  circuit  performance  modeling  and 
parametric  yield  prediction.  We  modeled  the  circuit  performances  by  computationally  inex¬ 
pensive  approximating  functions  of  the  critical  parameters.  An  experimental  design  that  takes 
replicated  observations  and  a  statistical  F-test  procedure  aie  used  to  assess  the  adequacy  of 


80 


the  four  critical  parameters  and  the  performance  model.  It  is  found  that  sometimes  there  are 
more  than  four  critical  parameters;  engineering  knowledge  has  been  used  to  identify  additional 
critical  parameters.  A  systematic  method  for  critical  parameter  selection  may  lead  to  more 
reliable  results. 

The  examples  in  Chapter  3  showed  that  the  circuit  performance  and  parametric  yield  of 
many  digital  circuits  can  be  modeled  accurately  from  about  10  mns  of  the  circuit  simulator,  if 
the  critical  parameters  are  sufficient.  Usually  a  circuit  performance,  such  as  delay  and 
power,  can  be  well  approximated  by  a  linear  equation  with  four  or  five  critical  parameters, 
thus,  the  small  number  of  runs. 

The  statistical  modeling  of  analog  circuit  performances  is  more  complicated.  Analog  cir¬ 
cuit  performances,  such  as  gain  and  bandwidth,  usually  depend  strongly  on  the  operation 
region  of  the  MOSFET  transistors  (off,  linear,  or  saturation  region).  If  the  statistical  varia¬ 
tions  cause  a  MOSFET  to  operate  outside  the  region  specified  by  the  circuit  designer,  there 
may  be  a  dramatic  shift  in  the  performance.  A  linear  or  quadratic  approximation  to  the  per¬ 
formance  function  may  be  inadequate.  As  a  result,  statistical  modeling  is  not  automatic,  and 
engineering  knowledge  is  often  needed  to  identify  and  fit  the  model.  With  careful  modeling, 
however,  accurate  approximations  to  the  performances  can  be  developed  with  relatively  few 
circuit  simulations. 

In  Chapter  4  we  presented  a  new  parametric  yield  optimization  method.  Yield  gradient 
methods  for  parametric  yield  maximization  usually  compute  the  yield  gradient  from  many  cir¬ 
cuit  simulator  mns;  this  often  leads  to  a  large  number  of  mns  and  hide  important  relationships 
in  the  data.  Our  method  divides  yield  optimization  into  two  separate  steps:  (i)  Model  the  cir¬ 
cuit  performance  by  an  approximating  function  of  the  inputs  to  the  circuit  simulator; 
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computationally  inexpensive  models,  optimize  the  yield  with  respect  to  the  designable  circuit 
parameters  using  a  proven  optimization  technique.  This  approach  optimizes  the  parametric 
yield  with  about  100  runs  and  avoids  the  complicated  yield  gradient  computation. 

In  Chapter  5  we  applied  the  circuit  performance  modeling  method  to  achieve  off-line 
quality  control.  The  Taguchi  method  for  off-line  is  reviewed.  Taguchi’s  method  models  the 
loss  statistic  of  a  product,  and  typically  requires  a  large  number  of  runs.  Our  method  uses 
fitted  models  of  the  circuit  performances  to  predict  Taguchi’s  loss  statistic  and  avoids  the 
nested  Taguchi  experiments.  We  showed  by  example  that  Taguchi’s  design  objectives  can  be 
met  by  our  method  with  about  one-third  of  the  runs. 

Clearly,  more  experience  is  needed  to  assess  the  circuit  performance  modeling  approach. 
There  is  substantial  industrial  interest  in  this  design  methodology,  and  we  expect  our  methods 
to  be  tested  on  larger  circuits  in  the  future. 

We  now  present  some  topics  for  future  research; 

•  Statistical  parameter  extraction.  The  extraction  of  the  device  parameters  from  meas¬ 
ured  I-V  characteristics  is  usually  formulated  as  curve  fitting  by  non-linear  least 
squares.  A  more  rigorous  extraction  methodology  that  states  the  confidence  limits  on 
the  parameter  estimates  is  needed. 

•  Screening  of  the  critical  parameters.  In  this  research  the  four  critical  MOSFET 
parameters  in  [  1  ]  are  used.  These  four  parameters  are  found  to  be  inadequate  in  some 
situations,  however.  The  more  rigorous  screening  techniques  summarized  in  [19]  may 
lead  to  better  results. 
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•  Mismatch  in  the  device  parameters.  The  device  parameter  variations  within  a  die  are 
assumed  to  be  insignificant  in  our  research.  In  high  performance  analog  circuits,  intra-die 
parameter  mismatch  may  lead  to  significant  degradation  of  the  circuit  performances.  A 
new  methodology  to  design  and  analyze  circuits  with  parameter  mismatch  is  desirable. 
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APPENDIX  A. 


THE  DESIGN  OF  EXPERIMENT 


A.l.  Introduction 

For  a  given  physical  system  (circuit),  denote  the  inputs  to  the  system  by  X  and  the 
response  by  y .  The  relation  between  the  response  and  the  inputs  is 

y=/i(X)  +  £  (A.l) 

where  h  (X)  is  a  systematic  function  of  X,  and  £  is  a  random  error  due  to  variations  in  the  ex¬ 
perimental  conditions.  Typically,  e  is  assumed  to  take  a  Gaussian  distribution  with  zero  mean 
and  variance  o^. 

The  problem  is  to 

(i)  Identify  a  model  of  the  system  response. 

(ii)  Select  inputs  Xj,  .  .  .  ,X/v  to  run  the  experiment,  such  that  approximation  y(X),  ob¬ 
tained  from  fitting  y  (Xi),...,y  (Xyy)  to  the  assumed  model,  predicts  the  response  accu¬ 
rately  (experimental  design  problem). 

In  a  physical  experiment  that  has  randomness,  the  fitted  model  y  is  subjected  to  two 
sources  of  errors: 

•  The  error  due  to  uncertainties  in  the  observations  (sampling,  or  variance  error). 

•  The  error  due  to  the  systematic  departure  of  the  model  from  the  actual  response  func¬ 
tion  (lack  of  fit,  or  bias  error). 
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Figure  A.l  shows  an  extreme  case  in  which  the  model  and  the  actual  response  h(X)  have 
the  same  form.  Due  to  the  random  error  in  the  observations,  the  fitted  model  is  different  from 
h(X)  (variance  error).  For  instance,  sampling  uncertainties  lead  to  distinct  fitted  models, 
y/(X)  and  y;j(X),  in  Figure  A.l.  As  the  sample  size  increases,  the  variance  error  will 
decrease  to  zero. 

Figure  A.2  shows  the  other  extreme  case  in  which  there  is  no  random  error,  e.g.,  an 
experiment  conducted  on  a  computer.  In  this  case,  least  square  regression  becomes  curve 
fitting.  The  error  in  the  fitted  model  y(X)  will  not  decrease  to  zero,  even  for  a  large  sample 
size,  due  to  the  systematic  difference  between  the  model  and  h  (X). 


A.2.  Integrated  Mean-squared  Error  Criterion 

Box  and  Draper  [49]  introduced  the  integrated  mean-squared  error  (IMSE)  c  \erion  for 
the  design  of  experiment.  For  a  given  X,  the  expected  squared  error  of  a  fitted  model  is 

A/5£  [  y  (X)  1  =  £  [  y  (X)  -  h  (X)  l^.  (A.2) 

Integration  of  Eq.  (A.2)  over  the  experimental  region  R  gives 

Qn  j  E  [yCX)-h{X)fdX  (A.3) 

ft 


where  1/Q^  =  f  dX.  Normalization  of  Eq.  (A.3)  with  respect  to  the  number  of  design  points 
N  and  the  error  variance  yields  the  integrated  mean-squared  error  (IMSE) 


J  = 


^OBS 


j  £[y(X)  -  h{X)fdX. 

•  D 


(A.4) 


The  integrated  mean-squared  error  can  be  decomposed  into  two  components 
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(7  =V  +  fl ),  the  average  variance 

Nnns  r 

y  ^  OB3  r  ^ 

and  the  average  squared  bias 

B  =  -  f  t£y(X)-/i(X)  ]2dX.  (A.6) 

An  imponant  point  is  that  whereas  V  depends  only  on  the  location  of  the  design  points 
Xi,.-mX,v,  B  depends  on  both  the  design  points  and  the  actual  response  function  hiX). 

To  fit  an  accurate  model,  the  experimental  design  D  =  {X  .  .  .  ,X^)  should  be  chosen 
such  that  J  is  minimized.  Welch  [50]  shows  that  if  the  systematic  departnre  of  the  model 
from  the  actual  response  surface,  l/i(X)  -  Ey(X)  I  is  bounded  by  A  for  all  the  X  in  tht  region 
R ,  then  the  integrated  mean-squared  error  is 

J  <J'(D.-)  =  V(D)+  {D ),  (A.7) 

where  B'  depends  only  on  the  experimental  design  D,  and  not  on  the  actual  resoonse  function 
h  (X).  This  fact  provides  a  guideline,  for  selecdng  an  optimal  design  that  minimizes  f .  Note 
that  f  depends  on  the  relative  magnitude  of  the  systematic  depanure  and  the  variance  of  the 

random  error,  — . 

a 

An  "excursion"  algorithm  to  select  the  design  D  that  minimizes  f  is  presented  in  [51]. 
The  design  of  experiment  is  formulated  as  a  combinatorial  optimization  problem  with  cost 
function  7'.  These  design  strategies  [50]  and  [51]  are  implemented  in  the  orogram  AGED 
(Algorithms  for  the  Construction  of  Experimental  Designs)  program  [28]. 


88 


A.3.  Example  of  Design  Point  Selection  with  AGED 

Suppose  for  simplicity  that  there  are  only  two  input  parameters  and  X2-  We  represent 
the  ranges  of  Xi  and  X2  by  the  levels  {- 1,0,1).  and  the  region  R  by  3^  =  9  grid  points,  as 
shown  in  Figure  A.3. 

The  assumed  model  is 

y  =  Pa  +  +  P2^2  +  £  (A.8) 

where  P^,  Pi,  and  P2  are  the  unknown  constants,  and  e  is  a  random  error.  The  average 
mean-squared  error  (am)  criterion  in  AGED  is  used  for  the  design  of  experiments.  Table  A.l 
lists  the  experimental  designs  that  minimize  the  average  mean-squared  error  for  five  variance- 

bias  ratios  (—  =  0,  0.31,  0.55,  1.25,  and  <»),  along  with  their  properties.  Each  design  D*(— ) 

cj  a 

is  denoted  in  Column  2  by  the  number  of  runs  at  a  grid  point.  For  example,  the  design  D  *  (0) 
takes  three  runs  at  (Xi,  ^2)  =  (-1,1),  and  two  runs  at  each  of  the  other  three  comers  of  R. 

A.3.1.  All-variance  design 

In  this  case  =  0,  the  variance  error  V  is  dominant.  AGED  designs  an  experiment 

D*  =  0  )  that  minimizes  V  and  distributes  the  points  as  equally  as  possible  to  the  comer 
of  the  design  region. 

A.3.2.  All-bias  design 

If  there  is  no  random  emor  in  the  experiment,  —  =  <».  AGED  designs  an  experiment 


Table  A.l  7' -optimal  designs  and  properties  (Nqbs  - 


DU—)  V 


B  e(D,0)  eiD,co)  e^(D) 


3 

0 

2 

2.39 

0.922 

0 

0 

0 

2 

0 

2 

2 

1 

2 

2.44 

0.858 

0 

0 

0 

2 

0 

2 

2 

1 

1 

2.64 

0.750 

1 

0 

1 

1 

1 

1 

1 

1 

1 

3.00 

0.667 

1 

1 

1 

1 

1 

1 

1 

1 

1 

3.00 

0.667 

1 

1 

1 

1 

1 

1 

fsli 


.5  88.9  88.9 


9.7  100.0  79.7 


9.7  100.0  79.7 


D*  (—  =  «» )  that  minimizes  the  bias  compoent  of  the  integrated  mean-squared  error  and  dis- 

o 

tributes  the  points  evenly  across  the  design  region. 

A.3  Selection  of  a  robust  design 

When  the  relative  importance  of  the  variance  and  bias  errors  is  unknown,  ACED 

searches  for  a  "robust"  design  good  over  a  wide  range  of  — .  We  define  the  percent 

c 

efficiency  of  a  design  D  relative  to  an  optimized  design  D*  for  as 

=  100%.  (A.9) 

CT  J  \P) 

For  instance,  e  (D ,  0)  measures  the  efficiency  of  D  relative  to  an  optimal  all-variance  design, 

D  *  (  —  =  0  ),  and  e  (D ,  <»)  measures  the  efficiency  of  D  relative  to  an  optimal  all-bias  design 

o 


a 

A  robust  design  D  should  have  a  large  efficiency  over  a  large  range  of  variance  and  bias 
error  ratios.  This  is  measured  b) 

e^(D  )  =  mine(D,  — ),  0<  -  <oo.  (A.IO) 

(T  a 

Furthermore,  it  can  be  shown  that  is  the  efficiency  with  respect  to  the  all- variance  or  all¬ 
bias  designs,  e(D,0)  or  e(D,oo)  [50].  Therefore,  a  robust  design  D  should  maximize 

^^(D)  =  min[e(D,0).  e(Z),op)].  (A.ll) 

The  minimum  efficiency  of  the  f  -optimal  design  at  various  values  of  —  is  shown  in  Column 
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5.  The  ACED  program  chooses  the  D  *  (0.55)  as  the  most  robust  design  because  its  minimum 
efficiency  of  88.9%  is  larger  than  the  minimum  efficiency  of  the  other  designs. 

A.4.  Design  and  Analysis  of  Computer  Experiments 

The  classical  experimental  designs,  associated  with  Box  and  co-workers  [24,26],  are 
invented  for  physical  experiments  that  have  random  errors.  However,  in  circuit  simulation, 
the  computer  code  always  produces  the  same  outputs  for  the  same  inputs. 

As  a  result,  there  are  two  ways  to  design  and  analyze  a  computer-simulation  experiment; 

(1)  In  addition  to  the  experimental  factors  X,  vary  the  other  inputs  to  the  computer  code 
according  to  their  distribution.  In  the  presence  of  random  errors,  classical  designs  can  be 
used  for  the  experiment. 

(2)  With  the  exception  of  the  experimental  factors  X,  fix  all  the  inputs  to  the  computer  code 
at  their  nominal  values.  In  the  absence  of  random  errors,  an  all-bias  design  from  ACED 
can  be  used  for  the  experiment. 

The  design  and  analysis  of  deterministic  computer-simulation  experiments  are  discussed 
in  [35]: 

•  The  absence  of  random  error  allows  the  complexity  of  the  actual  response  surface  to 
emerge. 

•  The  adequacy  of  the  fitted  response  surface  model  is  determined  solely  by  the  systematic 
departure  of  the  model  from  the  actual  response. 
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•  The  adequacy  of  the  fitted  response  surface  model  is  determined  solely  by  the  sys¬ 
tematic  departure  of  the  model  from  the  actual  response. 

•  There  is  no  obvious  statistical  basis  for  estimating  the  uncertainty. 

•  Conventional  notations  of  experimentai  unit,  blocking,  replication,  and  randomization 
are  irrelevant 

As  a  consequence,  it  is  unclear  if  the  current  methods  [24]  for  the  design  and  analysis  of 
physical  experiments  are  ideal  for  computer  simulations.  However,  Sacks,  Welch,  Mitchell, 
and  Wynn  [35]  assert  that 

•  The  selection  of  inputs  at  which  to  run  a  computer  code  is  still  an  experimental  design 
problem. 

•  Statistical  principles  and  attitudes  to  data  analysis  are  helpful  however  the  data  are 
generated. 

•  There  is  uncertainty  associated  with  predictions  from  the  fitted  models,  and  the 
quantification  of  uncenainty  is  a  statistical  analysis  problem. 

A.5.  Latin  Hypercube  Design 

McKay,  Beckman,  and  Conover  [25]  were  among  the  first  to  explicitly  consider  the  sta¬ 
tistical  analysis  of  deterministic  computer  codes.  They  introduced  Latin  hypercube  sampling, 
and  advocated  it  as  an  alternative  to  simple  random  sampling  in  Monte  Carlo  studies  (see  Sec¬ 
tion  2.1).  The  objective  of  Latin  hypercube  sampling  is  to  determine,  for  a  known  input  dis¬ 
tribution,  the  distributions  of  the  outputs. 
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STEP  1:  Sample  the  range  of  each  input  variable  x*,  k  =  1 . p,  into  N  (evenly-spaced) 

points,  and  denote  the  samples  by  . 

STEP  2;  The  samples  j  =  1,...JV,  will  form  the  component,  k  =  1 . p,  in  X,  , 

/  =  l,...,iV.  The  components  of  the  various  Xj^'s  are  matched  at  random  to  spread 
the  samples  evenly  in  the  input  space. 

Steck,  Iman,  and  Dahlgren  [52]  and  Atwood  [53]  advocated  using  the  Latin  hypercube 
sampling  method  for  the  design  of  computer  experiments.  The  study  in  [52]  shows  Latin 
hypercube  designs  (samples)  are  superior  to  classical  fractional  factorials  designs  [26]  for 
fitting  response  surfaces  in  several  computer  experiments. 

Compared  to  designs  from  the  average  mean-squared  error  criterion  in  AGED,  the  Latin 
hypercube  designs  are  much  cheaper  and  they  can  easily  handle  problems  of  high  dimensions. 
For  a  small  experiment,  however,  an  all-bias  design  from  ACED  may  outperform  a  Latin 
hypercube  design  that  has  the  same  number  of  runs. 

A  FORTRAN  program  to  generate  Latin  hypercube  designs  is  given  below; 
program  latin 

c  n  =  number  of  independent  factors 

c  nobs  =  number  of  runs  in  the  experiment 

c  lhs(nobs,m)  =  matrix  of  latin  hypercube  design 

parameter  (m=5,  nobs=12) 
integer  iseed 

integer  lhs(nobs,  m),  iper(nobs) 
real  x(nobs),  delta 


integer  i,j 


external  mper,  mset 


c... Discretize  [-1,1]  interval  into  nobs  intervals  of  length  delta 

delta  =  2.0  /  (nobs-1) 

do  10  i=  0,  nobs-1,  1 
10  x(i+l)  =  -1.0  +  i  *  delta 


c.. .Setup  with  IMSL,  initialize  random  number  generator 

iseed  =  1234567 
call  mset(iseed) 

c.. .Random  permutation  for  column  i=l,...,m 

do  20  j=l,m,l 

call  mper(nobs,  iper) 
do  30  i=l,nobs,l 

lhs(i,j)  =  iper(i) 

30  continue 

20  continue 


c... Print  out  Latin  Hypercube  design 

write  (6,800)  m,  nobs 
do  50  i=l,nobs,l 

write  (6,900)  i,  (x(lhs(io)),  j=l,m,l) 

50  continue 
stop 

800  format(’  Latin  hypercube  design  of  ’,  i5,’  factors  with’,  i5, 
1  ’  observations’//) 

900  format(i2,2x,100F10.4) 

end 
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APPENDIX  B. 

MODEL  ASSESSMENT 

B.l.  A  Statistical  F-test  Procedure  for  Model  Assessment 

For  a  given  regression  model,  denote  the  least  square  prediction  of  the  i’th  data  point, 
^(x,  ),  by  y(;c,  )  for  i  =  1,...,N.  The  procedure  in  [30]  considers  a  fitted  model  to  be  adequate 
if  the  following  conditions  are  satisfied: 

(a)  The  variation  in  yCx^),  i  =  1,...,N,  "explained"  by  y(x)  is  substantiailly  larger  than  the  er¬ 
ror  in  the  regression. 

(b)  The  lack  of  fit  (LOF)  of  data  to  the  model  is  insignificant. 


The  sum  of  squares  variations  explained  by  the  model  is 

N 

=  Z  -  yavg]-. 

i=l 

where  is  the  average  of  yfx,  ),  i  =  1,...,N. 

The  error  sum  of  squares  is 

S^Error  =  Z  ) 
i=l 

The  normalized  regression  sum  of  squares  is 


SS 

^^Model  -  - 

and  the  normalized  sum  of  squares  of  the  errors  is 


Model 

a7~ 


(B.l) 


(B.2) 


(B.3) 
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^^Error 


^^Model 

N  -  M  -\' 


(B.4) 


A  statistical  F-test  is  introduced  in  [30]  to  test  condition  (a).  The  goal  is  to  determine 
whether  the  expected  variation  due  to  the  model  E[  MS^odei  1  times  larger  than 

the  expected  error,  E[  MS^^or  ]  (typically  =  2  to  4). 

The  F-statistic  of  the  regression 


^1  = 


Error 


(B.5) 


is  compared  with  the  corresponding  critical  F-value,  The  critical  is  the  1  -  a 

percentile  point  of  a  noncentral  F-distribution  with  M  and  N  -  M  -  1  degrees  of  freedom,  and 
the  noncentrality  parameter  y^^.  If  is  larger  than  condition  (a)  is  considered 

satisfied. 

To  assess  the  lack  of  fit  in  the  model,  we  partition  the  sum  of  squares  error  into  "pure 
error"  (PE)  and  "lack  of  fit"  (LOF)  components.  The  "pure  error"  is  due  to  random  variations 
only.  If  there  are  k  replicated  runs  at  x,  >'(i),....,y(t),  the  "pure  error"  sum  of  squares  is 

k 

Z[3'(.)(^) (B.6) 

«=i 


where  y  is  the  average  response  at  x.  The  total  pure  error  sum  of  squares,  SSp£,  is  the  sum 
of  the  individual  "pure  error"  terms  over  all  x.  We  denote  the  degrees  of  freedom  due  to 
pure  error  by  K. 

The  sum  of  squares  due  to  the  lack  of  fit  is 

^^LOF  -  ^^Error  ~  ^^PE-  (®-7) 
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It  has  N  -  M  -  K  -  1  degrees  of  freedom.  The  normalized  sum  of  squares  due  to  pure  error 
and  lack  of  fit  is  denoted  by  MSp^  and  MS^Qp,  respectively. 


The  F-staristic 


MSiqp 

MSpp 


(B.8) 


estimates  the  importance  of  the  lack  of  fit.  We  compare  F2  with  Fj^cr-  Th®  critical  value 
F  2,cr  the  1  -  a  percentile  point  of  a  central  F-distribution  with  K  and  N  -  M  -  K  -  1 

degrees  of  freedom.  If  F2  is  less  than  condition  (b)  is  considered  satisfied.  The  rela¬ 
tions  between  the  sum  of  squares  and  F-statistics  are  summarized  in  an  analysis  of  variance 
(ANOVA)  table  (Table  B.l). 


The  F-tests  have  three  possible  outcomes: 


Case  1.  If  Ff  >  F^  ^^,  condition  (a)  is  satisfied.  The  fitted  model  y(X)  is  considered  ade¬ 
quate. 

Case  2.  If  Fj  <  Fj^^  and  F2  <  F2,cr-  condition  (a)  is  violated  but  the  lack  of  fit  is 
insignificant.  Hence  the  fitted  model  is  declared  inadequate.  The  model  inadequacy 
may  be  caused  by  either  the  small  sample  size  or  the  effects  of  parameters  that  are 
not  included  in  the  model. 

Case  If  Fi  <  Fj  ^^  and  F2  >  F2,^^,  condition  (a)  is  violated  and  the  lack  of  fit  is 
significant.  The  fitted  model  is  considered  inadequate,  and  a  more  complex  model 
may  be  needed  to  represent  the  response  surface. 
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Table  B.l  ANOVA  table  for  the  regression  model. 


B.2.  Assessing  the  Goodness  of  Fit 

For  a  given  regression  model,  denote  the  least  square  prediction  of  the  i’th  data  point, 
yiXi),  by  y{Xi)  for  i  =  1,...,N.  Then  the  squared  error  of  prediction  is  [y(x,)  -y(,Xi)]^.  The 
^  ^-statistic  relates  the  total  of  these  squared  errors  to  the  variability  in  the  data: 

i  [y'(JCi)-y(x.)]2 

-  (B.9) 

Z  Ly(JCi) 

1=1 

where  y  is  the  mean  of  ,  y(Xf^).  It  can  be  shown  that  0  <  /?^  <  1,  with  larger 

values  suggesting  better  agreement  between  the  predictions  and  the  model. 

One  problem  with  ^  ^  is  that  it  tends  to  overestimate  the  predictive  power  of  a  regression 
model,  because  the  same  data  are  used  to  fit  and  to  test  the  regression.  A  more  stringent  test 
is  based  on  predicting  yiXi)  by  y_/(x),  where  y_/0c/)  is  the  least  squares  prediction  based  on 
all  the  data  except  the  i’th  case.  This  leads  to 

Z  [y-iU.  ) 

^^PRS  -  1  - •  (B.IO) 

«=i 

Again,  the  R^pps  value  of  1  indicates  perfect  agreement  between  the  predictor  and  the  data, 
but  R^pps  can  be  much  less  than  R^  (even  less  than  0!),  indicating  possibly  poor  predictive 
capability  and  lack  of  fit. 
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